Strings are sequences of characters (like hello
). Each character is an 8-bit value from the entire 256-character set (there's nothing special about the NUL character, as in some languages).
The shortest possible string has no characters. The longest string fills all of your available memory (although you wouldn't be able to do much with that). This is in accordance with the principle of "no built-in limits" that Perl follows at every opportunity. Typical strings are printable sequences of letters, digits, and punctuation in the ASCII 32 to ASCII 126 range. However, the ability to have any character from 0 to 255 in a string means that you can create, scan, and manipulate raw binary data as strings - a task with which most other utilities would have great difficulty. (For example, you can patch your operating system by reading it into a Perl string, making the change, and writing the result back out.)
Like numbers, strings have a literal representation (the way you represent the string in a Perl program). Literal strings come in two different flavors: single-quoted strings and double-quoted strings.[5] Another form that looks rather like these two is the back-quoted string (`like this`). This form isn't so much a literal string as a way to run external commands and get back their output. This form is covered in Chapter 14, Process Management.
[5] Perl also has here strings, which we'll touch on in Chapter 18, CGI Programming.
A single-quoted string is a sequence of characters enclosed in single quotes. The single quotes are not part of the string itself; they're just there to let Perl identify the beginning and the ending of the string. Any character between the quote marks (including newline characters, if the string continues onto successive lines) is legal inside a string. There are two exceptions: to get a single quote into a single-quoted string, precede it by a backslash; and, to get a backslash into a single-quoted string, precede the backslash by a backslash. In other pictures:
'hello' # five characters: h, e, l, l, o 'don\'t' # five characters: d, o, n, single quote, t '' # the null string (no characters) 'silly\\me' # silly, followed by backslash, followed by me 'hello\n' # hello followed by backslash followed by n 'hello there' # hello, newline, there (11 characters in all)
Note that the \n
within a single-quoted string is not interpreted as a newline, but as the two characters backslash and n
. (Only when the backslash is followed by another backslash or a single quote does it have special meaning.)
A double-quoted string acts a lot like a C string. Once again, it's a sequence of characters, although this time enclosed in double quotes. But now the backslash takes on its full power to specify certain control characters, or even any character at all through octal and hex representations. Here are some double-quoted strings:
"hello world\n"# hello world, and a newline "new \007" # new, space, and the bell character (octal 007) "coke\tsprite" # a coke, a tab, and a sprite "c:\\temp" # c:, backslash, and temp
The backslash can precede many different characters to mean different things (typically called a backslash escape). The complete list of double-quoted string escapes is given in Table 2.1.
Construct | Meaning |
---|---|
| Newline |
| Return |
| Tab |
| Formfeed |
| Backspace |
| Vertical tab |
| Bell |
| Escape |
| Any octal ASCII value (here, 007 = bell) |
| Any hex ASCII value (here, 7f = delete) |
| Any "control" character (here, control C) |
| Backslash |
| Doublequote |
| Lowercase next letter |
| Lowercase all following letters until |
| Uppercase next letter |
| Uppercase all following letters until |
| Backslash quote all nonalphanumerics |
| Terminate |
Another feature of double-quoted strings is that they are variable interpolated, meaning that scalar and array variables within the strings are replaced with their current values when the strings are used. We haven't formally been introduced to what a variable looks like yet (except in the stroll), so I'll get back to this later.
A quick note here about using DOS/Win32 pathnames in double-quoted strings: while Perl accepts either backslashes or forward slashes in path names, backslashes need to be escaped. So, you need to write one of the following:
"c:\\temp" # use an escaped backslash "c:/temp" # use a Unix-style forward slash
If you forget to escape the backslash, you'll end up with strange results:
"c:\temp" # WRONG - this string contains a c:, a TAB, and emp
If you're already used to using pathnames in C/C++, this notation will be second nature to you. Otherwise, beware: pathnames seem to bite each and every Perl-for-Win32 programmer from time to time.