Programming Perl, Second Edition

Previous Chapter 2 Next
 

2.9 Special Variables

The following names have special meaning to Perl. Most of the punctuational names have reasonable mnemonics, or analogs in one of the shells. Nevertheless, if you wish to use the long variable names, just say:

use English;

at the top of your program. This will alias all the short names to the long names in the current package. Some of them even have medium names, generally borrowed from awk (1).

A few of these variables are considered read-only. This means that if you try to assign to this variable, either directly, or indirectly through a reference, you'll raise a run-time exception.

Regular Expression Special Variables

There are several variables that are associated with regular expressions and pattern matching. Except for $* they are always local to the current block, so you never need to mention them in a local. (And $* is deprecated, so you never need to mention it at all.)

$digit

Contains the text matched by the corresponding set of parentheses in the last pattern matched, not counting patterns matched in nested blocks that have been exited already. (Mnemonic: like \digit.) These variables are all read-only.

$& $MATCH

The string matched by the last successful pattern match, not counting any matches hidden within a block or eval enclosed by the current block. (Mnemonic: like & in some editors.) This variable is read-only.

$` $PREMATCH

The string preceding whatever was matched by the last successful pattern match not counting any matches hidden within a block or eval enclosed by the current block. (Mnemonic: ` often precedes a quoted string.) This variable is read-only.

$' $POSTMATCH

The string following whatever was matched by the last successful pattern match not counting any matches hidden within a block or eval enclosed by the current block. (Mnemonic: ' often follows a quoted string.) Example:

$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n";         # prints abc:def:ghi

This variable is read-only.

$+ $LAST_PAREN_MATCH

The last bracket matched by the last search pattern. This is useful if you don't know which of a set of alternative patterns matched. For example:

/Version: (.*)|Revision: (.*)/ && ($rev = $+);

(Mnemonic: be positive and forward looking.) This variable is read-only.

$* $MULTILINE_MATCHING

Use of $* is now deprecated, and is allowed only for maintaining backwards compatibility with older versions of Perl. Use /m (and maybe /s) in the regular expression match instead.

Set to 1 to do multi-line matching within a string, 0 to tell Perl that it can assume that strings contain a single line for the purpose of optimizing pattern matches. Pattern matches on strings containing multiple newlines can produce confusing results when $* is 0. Default is 0. (Mnemonic: * matches multiple things.) Note that this variable only influences the interpretation of ^ and $. A literal newline can be searched for even when $* == 0.

Per-Filehandle Special Variables

These variables never need to be mentioned in a local because they always refer to some value pertaining to the currently selected output filehandle--each filehandle keeps its own set of values. When you select another filehandle, the old filehandle keeps whatever values it had in effect, and the variables now reflect the values of the new filehandle.

To go a step further and avoid select entirely, these variables that depend on the currently selected filehandle may instead be set by calling an object method on the FileHandle object. (Summary lines below for this contain the word HANDLE.) First you must say:

use FileHandle;

after which you may use either:

method HANDLE EXPR

or:

HANDLE->method(EXPR)

Each of the methods returns the old value of the FileHandle attribute. The methods each take an optional EXPR, which if supplied specifies the new value for the FileHandle attribute in question. If not supplied, most of the methods do nothing to the current value, except for autoflush, which will assume a 1 for you, just to be different.

$| $OUTPUT_AUTOFLUSH autoflush HANDLE EXPR

If set to nonzero, forces an fflush (3) after every write or print on the currently selected output channel. (This is called "command buffering". Contrary to popular belief, setting this variable does not turn off buffering.) Default is 0, which on many systems means that STDOUT will default to being line buffered if output is to the terminal, and block buffered otherwise. Setting this variable is useful primarily when you are outputting to a pipe, such as when you are running a Perl script under rsh and want to see the output as it's happening. This has no effect on input buffering. If you have a need to flush a buffer immediately after setting $|, you may simply print ""; rather than waiting for the next print to flush it. (Mnemonic: when you want your pipes to be piping hot.)

$% $FORMAT_PAGE_NUMBER format_page_number HANDLE EXPR

The current page number of the currently selected output channel. (Mnemonic: % is page number in nroff.)

$= $FORMAT_LINES_PER_PAGE format_lines_per_page HANDLE EXPR

The current page length (printable lines) of the currently selected output channel. Default is 60. (Mnemonic: = has horizontal lines.)

$- $FORMAT_LINES_LEFT format_lines_left HANDLE EXPR

The number of lines left on the page of the currently selected output channel. (Mnemonic: lines_on_page - lines_printed.)

$~ $FORMAT_NAME format_name HANDLE EXPR

The name of the current report format for the currently selected output channel. Default is name of the filehandle. (Mnemonic: takes a turn after $^.)

$^ $FORMAT_TOP_NAME format_top_name HANDLE EXPR

The name of the current top-of-page format for the currently selected output channel. Default is name of the filehandle with _TOP appended. (Mnemonic: points to top of page.)

Global Special Variables

There are quite a few variables that are global in the fullest sense--they mean the same thing in every package. If you want a private copy of one of these, you must localize it in the current block.

$_ $ARG

The default input and pattern-searching space. These pairs are equivalent:

while (<>) {...}    # only equivalent in while!
while (defined($_ = <>)) {...}
/^Subject:/
$_ =~ /^Subject:/
tr/a-z/A-Z/
$_ =~ tr/a-z/A-Z/
chop
chop($_)

Here are the places where Perl will assume $_ even if you don't use it:

  • Various unary functions, including functions like ord and int, as well as the all file tests (-f, -d) except for -t, which defaults to STDIN.

  • Various list functions like print and unlink.

  • The pattern-matching operations m//, s///, and tr/// when used without an =~ operator.

  • The default iterator variable in a foreach loop if no other variable is supplied.

  • The implicit iterator variable in the grep and map functions.

  • The default place to put an input record when a <FH> operation's result is tested by itself as the sole criterion of a while test. Note that outside of a while test, this will not happen.

Mnemonic: underline is the underlying operand in certain operations.

$. $INPUT_LINE_NUMBER $NR

The current input line number of the last filehandle that was read. An explicit close on the filehandle resets the line number. Since <> never does an explicit close, line numbers increase across ARGV files (but see examples under eof in Chapter 3, Functions). Localizing $. has the effect of also localizing Perl's notion of the last read filehandle. (Mnemonic: many programs use "." to mean the current line number.)

$/ $INPUT_RECORD_SEPARATOR $RS

The input record separator, newline by default. It works like awk 's RS variable, and, if set to the null string, treats blank lines as delimiters. You may set it to a multi-character string to match a multi-character delimiter. Note that setting it to "\n\n" means something slightly different than setting it to "", if the file contains consecutive blank lines. Setting it to "" will treat two or more consecutive blank lines as a single blank line. Setting it to "\n\n" means Perl will blindly assume that the next input character belongs to the next paragraph, even if it's a third newline. (Mnemonic: / is used to delimit line boundaries when quoting poetry.)

undef $/;
$_ = <FH>;          # whole file now here
s/\n[ \t]+/ /g;

$, $OUTPUT_FIELD_SEPARATOR $OFS

The output field separator for the print operator. Ordinarily the print operator simply prints out the comma separated fields you specify. In order to get behavior more like awk, set this variable as you would set awk 's OFS variable to specify what is printed between fields. (Mnemonic: what is printed when there is a "," in your print statement.)

$\ $OUTPUT_RECORD_SEPARATOR $ORS

The output record separator for the print operator. Ordinarily the print operator simply prints out the comma-separated fields you specify, with no trailing newline or record separator assumed. In order to get behavior more like awk, set this variable as you would set awk 's ORS variable to specify what is printed at the end of the print. (Mnemonic: you set $\ instead of adding "\n" at the end of the print. Also, it's just like /, but it's what you get "back" from Perl.)

$` $LIST_SEPARATOR

This is like $, above except that it applies to list values interpolated into a double-quoted string (or similar interpreted string). Default is a space. (Mnemonic: obvious, I think.)

$; $SUBSCRIPT_SEPARATOR $SUBSEP

The subscript separator for multi-dimensional array emulation. If you refer to a hash element as:

$foo{$a,$b,$c}

it really means:

$foo{join($;, $a, $b, $c)}

But don't put:

@foo{$a,$b,$c}      # a slice--note the @

which means:

($foo{$a},$foo{$b},$foo{$c})

Default is "\034", the same as SUBSEP in awk. Note that if your keys contain binary data there might not be any safe value for $;. (Mnemonic: comma--the syntactic subscript separator--is a semi-semicolon. Yeah, I know, it's pretty lame, but $, is already taken for something more important.)

This variable is for maintaining backward compatibility, so consider using "real" multi-dimensional arrays now.

$^L $FORMAT_FORMFEED format_formfeed HANDLE EXPR

What a format outputs to perform a formfeed. Default is `\f`.

$: $FORMAT_LINE_BREAK_CHARACTERS format_line_break_characters HANDLE EXPR

The current set of characters after which a string may be broken to fill continuation fields (starting with ^) in a format. Default is ` \n-`, to break on whitespace or hyphens. (Mnemonic: a colon in poetry is a part of a line.)

$^A $ACCUMULATOR

The current value of the write accumulator for format lines. A format contains formline commands that put their result into $^A. After calling its format, write prints out the contents of $^A and empties. So you never actually see the contents of $^A unless you call formline yourself and then look at it.

$# $OFMT

Use of $# is now deprecated and is allowed only for maintaining backwards compatibility with older versions of Perl. You should use printf instead. $# contains the output format for printed numbers. This variable is a half-hearted attempt to emulate awk 's OFMT variable. There are times, however, when awk and Perl have differing notions of what is in fact numeric. Also, the initial value is approximately %.14g rather than %.6g, so you need to set $# explicitly to get awk 's value. (Mnemonic: # is the number sign. Better yet, just forget it.)

$? $CHILD_ERROR

The status returned by the last pipe close, backtick (``) command, or system operator. Note that this is the status word returned by the wait (2) system call, so the exit value of the subprocess is actually ($? >> 8). Thus on many systems, ($? & 255) gives which signal, if any, the process died from, and whether there was a core dump. (Mnemonic: similar to sh and ksh.)

$! $OS_ERROR $ERRNO

If used in a numeric context, yields the current value of the errno variable (identifying the last system call error) in the currently executing perl, with all the usual caveats. (This means that you shouldn't depend on the value of $! to be anything in particular unless you've gotten a specific error return indicating a system error.) If used in a string context, yields the corresponding system error string. You can assign to $! in order to set errno, if, for instance, you want $! to return the string for error n, or you want to set the exit value for the die operator. (Mnemonic: What just went bang?)

$@ $EVAL_ERROR

The Perl syntax error message from the last eval command. If null, the last eval was parsed and executed correctly (although the operations you invoked may have failed in the normal fashion). (Mnemonic: Where was the syntax error "at"?)

Note that warning messages are not collected in this variable. You can, however, set up a routine to process warnings by setting $SIG{_ _WARN_ _} below.

$$ $PROCESS_ID $PID

The process number of the Perl running this script. (Mnemonic: same as shells.)

$< $REAL_USER_ID $UID

The real user ID (uid) of this process. (Mnemonic: it's the uid you came from, if you're running setuid.)

$> $EFFECTIVE_USER_ID $EUID

The effective uid of this process. Example:

$< = $>;            # set real to effective uid
($<,$>) = ($>,$<);  # swap real and effective uid

(Mnemonic: it's the uid you went to, if you're running setuid.) Note: $< and $> can only be swapped on machines supporting setreuid (2). And sometimes not even then.

$( $REAL_GROUP_ID $GID

The real group ID (gid) of this process. If you are on a machine that supports membership in multiple groups simultaneously, gives a space-separated list of groups you are in. The first number is the one returned by getgid (1), and the subsequent ones by getgroups(2), one of which may be the same as the first number. (Mnemonic: parentheses are used to group things. The real gid is the group you left, if you're running setgid.)

$) $EFFECTIVE_GROUP_ID $EGID

The effective gid of this process. If you are on a machine that supports membership in multiple groups simultaneously, $) gives a space-separated list of groups you are in. The first number is the one returned by getegid (2), and the subsequent ones by getgroups (2), one of which may be the same as the first number. (Mnemonic: parentheses are used to group things. The effective gid is the group that's right for you, if you're running setgid.)

Note: $<, $>, $(, and $) can only be set on machines that support the corresponding system set-id routine. $( and $) can only be swapped on machines supporting setregid(2). Because Perl doesn't currently use initgroups(2), you can't set your group vector to multiple groups.

$0 $PROGRAM_NAME

Contains the name of the file containing the Perl script being executed. Assigning to $0 attempts to modify the argument area that the ps (1) program sees. This is more useful as a way of indicating the current program state than it is for hiding the program you're running. But it doesn't work on all systems. (Mnemonic: same as sh and ksh.)

$[

The index of the first element in an array, and of the first character in a substring. Default is 0, but you could set it to 1 to make Perl behave more like awk (or FORTRAN) when subscripting and when evaluating the index and substr functions. (Mnemonic: [ begins subscripts.)

Assignment to $[ is now treated as a compiler directive, and cannot influence the behavior of any other file. Its use is discouraged.

$] $PERL_VERSION

Returns the version + patchlevel / 1000. It can be used to determine at the beginning of a script whether the Perl interpreter executing the script is in the right range of versions. Example:

warn "No checksumming!\n" if $] < 3.019;
die "Must have prototyping available\n" if $] < 5.003;

(Mnemonic: Is this version of Perl in the right bracket?)

$^D $DEBUGGING

The current value of the debugging flags. (Mnemonic: value of -D switch.)

$^F $SYSTEM_FD_MAX

The maximum system file descriptor, ordinarily 2. System file descriptors are passed to exec ed processes, while higher file descriptors are not. Also, during an open, system file descriptors are preserved even if the open fails. (Ordinary file descriptors are closed before the open is attempted, and stay closed if the open fails.) Note that the close-on-exec status of a file descriptor will be decided according to the value of $^F at the time of the open, not the time of the exec.

$^H

This variable contains internal compiler hints enabled by certain pragmatic modules. Hint: ignore this and use the pragmata.

$^I $INPLACE_EDIT

The current value of the inplace-edit extension. Use undef to disable inplace editing. (Mnemonic: value of -i switch.)

$^O $OSNAME

This variable contains the name of the operating system the current Perl binary was compiled for. It's intended as a cheap alternative to pulling it out of the Config module.

$^P $PERLDB

The internal flag that the debugger clears so that it doesn't debug itself. You could conceivably disable debugging yourself by clearing it.

$^T $BASETIME

The time at which the script began running, in seconds since the epoch (the beginning of 1970, for UNIX systems). The values returned by the -M, -A, and -C filetests are based on this value.

$^W $WARNING

The current value of the warning switch, either true or false. (Mnemonic: the value is related to the -w switch.)

$^X $EXECUTABLE_NAME

The name that the Perl binary itself was executed as, from C's argv[0].

$ARGV

Contains the name of the current file when reading from <ARGV>.

Global Special Arrays

The following arrays and hashes are global. Just like the special global scalar variables, they refer to package main no matter when they are referenced. The following two statements are exactly the same:

print "@INC\n";
print "@main::INC\n";

@ARGV

The array containing the command-line arguments intended for the script. Note that $#ARGV is generally the number of arguments minus one, since $ARGV[0] is the first argument, not the command name. See $0 for the command name.

@INC

The array containing the list of places to look for Perl scripts to be evaluated by the do EXPR, require, or use constructs. It initially consists of the arguments to any -I command-line switches, followed by the default Perl libraries, such as:

/usr/local/lib/perl5/$ARCH/$VERSION
/usr/local/lib/perl5
/usr/local/lib/perl5/site_perl
/usr/local/lib/perl5/site_perl/$ARCH

followed by ".", to represent the current directory. If you need to modify this list at run-time, you should use the lib module in order to also get the machine-dependent library properly loaded:

use lib '/mypath/libdir/';
use SomeMod;

@F

The array into which the input lines are split when the -a command-line switch is given. If the -a option is not used, this array has no special meaning. (This array is actually only @main::F, and not in all packages at once.)

%INC

The hash containing entries for the filename of each file that has been included via do or require. The key is the filename you specified, and the value is the location of the file actually found. The require command uses this array to determine whether a given file has already been included.

%ENV

The hash containing your current environment. Setting a value in %ENV changes the environment for child processes:

$ENV{PATH} = "/bin:/usr/bin";

To remove something from your environment, make sure to use delete instead of undef.

Note that processes running as a crontab entry inherit a particularly impoverished set of environment variables. Also note that you should set $ENV{PATH}, $ENV{SHELL}, and $ENV{IFS} if you are running as a setuid script. See Chapter 8, Other Oddments, for more on security and setuid issues.

%SIG

The hash used to set signal handlers for various signals. Example:

sub handler {       # 1st argument is signal name
    local($sig) = @_;
    print "Caught a SIG$sig--shutting down\n";
    close(LOG);
    exit(0);
}
$SIG{INT} = 'handler';
$SIG{QUIT} = 'handler';
...
$SIG{INT} = 'DEFAULT';    # restore default action
$SIG{QUIT} = 'IGNORE';    # ignore SIGQUIT

The %SIG array only contains values for the signals actually set within the Perl script. Here are some other examples:

$SIG{PIPE} = Plumber;     # SCARY!!
$SIG{PIPE} = "Plumber";   # just fine, assumes main::Plumber
$SIG{PIPE} = \&Plumber;   # just fine; assume current Plumber
$SIG{PIPE} = Plumber();   # oops, what did Plumber() return??

The example marked SCARY!! is problematic because it's a bareword, which means sometimes it's a string representing the function, and sometimes it's going to call the subroutine right then and there! Best to be sure and quote it or take a reference to it. Certain internal hooks can also be set using the %SIG hash. The routine indicated by $SIG{_ _WARN_ _} is called when a warning message is about to be printed. The warning message is passed as the first argument. The presence of a _ _WARN_ _ hook causes the ordinary printing of warnings to STDERR to be suppressed. You can use this to save warnings in a variable, or turn warnings into fatal errors, like this:

local $SIG{_ _WARN_ _} = sub { die $_[0] };
eval $proggie;

The routine indicated by $SIG{_ _DIE_ _} is called when a fatal exception is about to be thrown. The error message is passed as the first argument. When a _ _DIE_ _ hook routine returns, the exception processing continues as it would have in the absence of the hook, unless the hook routine itself exits via a goto, a loop exit, or a die. The _ _DIE_ _ handler is explicitly disabled during the call, so that you yourself can then call the real die from a _ _DIE_ _ handler. (If it weren't disabled, the handler would call itself recursively forever.) The case is similar for _ _WARN_ _.

Global Special Filehandles

The following filehandles (except for DATA) always refer to main::FILEHANDLE.

ARGV

The special filehandle that iterates over command line filenames in @ARGV. Usually written as the null filehandle in <>.

STDERR

The special filehandle for standard error in any package.

STDIN

The special filehandle for standard input in any package.

STDOUT

The special filehandle for standard output in any package.

DATA

The special filehandle that refers to anything following the _ _END_ _ token in the file containing the script. Or, the special filehandle for anything following the _ _DATA_ _ token in a required file, as long as you're reading data in the same package that the _ _DATA_ _ was found in.

_ (underline)

The special filehandle used to cache the information from the last stat, lstat, or file test operator.


Previous Home Next
Formats Book Index Functions