Like many languages, Perl provides for user-defined subroutines. (We'll also call them functions, but functions are the same thing as subroutines in Perl.) These subroutines may be defined anywhere in the main program, loaded in from other files via the do, require, or use keywords, or even generated on the fly using eval. You can generate anonymous subroutines, accessible only through references. You can even call a subroutine indirectly using a variable containing either its name or a reference to the routine.
To declare a subroutine, use one of these forms:
subNAME
; # A "forward" declaration. subNAME
(PROTO
); # Ditto, but with prototype.
To declare and define a subroutine, use one of these forms:
subNAME
BLOCK
# A declaration and a definition. subNAME
(PROTO
)BLOCK
# Ditto, but with prototype.
To define an anonymous subroutine or closure at run-time, use a statement like:
$subref = sub BLOCK
;
To import subroutines defined in another package, say:
usePACKAGE
qw(NAME1 NAME2 NAME3
...);
To call subroutines directly:
NAME
(LIST
); # & is optional with parentheses.NAME
LIST
; # Parens optional if predeclared/imported. &NAME;
# Passes current @_ to subroutine.
To call subroutines indirectly (by name or by reference):
&$subref(LIST
); # & is not optional on indirect call.
&$subref; # Passes current @_ to subroutine.
The Perl model for passing data into and out of a subroutine is simple:
all function parameters are passed as one single, flat list of scalars,
and multiple return values are likewise returned to the caller as one single, flat
list of scalars. As with any LIST
, any arrays or hashes passed in these
lists will interpolate their values into the flattened list, losing
their identities - but there are several ways to get around this, and the
automatic list interpolation is frequently quite useful. Both parameter
lists and return lists may contain as many or as few scalar elements as
you'd like (though you may put constraints on the parameter list using
prototypes). Indeed, Perl is designed around this notion of variadic
functions (those taking any number of arguments), unlike C, where they're sort of grudgingly kludged in so that
you can call printf(3).
Now, if you're going to design a language around the notion of passing
varying numbers of arbitrary arguments, you'd better make it easy to
process those arbitrary lists of arguments. In the interests of dealing
with the function parameters as a list, any arguments passed to a
Perl routine come in as the array @_
. If you call a function with
two arguments, those would be stored in $_[0]
and $_[1]
.
Since @_
is an array, you can use any array operations you like
on the parameter list. (This is an area where Perl is more
orthogonal than the typical computer language.) The array @_
is a
local array, but its values are implicit references to the actual scalar
parameters. Thus you can modify the actual parameters if you modify the
corresponding element of @_
. (This is rarely done, however,
since it's so easy to return interesting values in Perl.)
The return value of the subroutine (or of any other block, for that matter) is the value of the last expression evaluated. Or you may use an explicit return statement to specify the return value and exit the subroutine from any point in the subroutine. Either way, as the subroutine is called in a scalar or list context, so also is the final expression of the routine evaluated in the same scalar or list context.
Perl does not have named formal parameters, but in practice all you do is
assign the contents of @_
to a my list, which serves nicely for
a list of formal parameters. But you don't have to, which is the whole
point of the @_
array.
For example, to calculate a maximum, the
following routine just iterates over @_
directly:
sub max { my $max = shift(@_); foreach $foo (@_) { $max = $foo if $max < $foo; } return $max; } $bestday = max($mon,$tue,$wed,$thu,$fri);
Here's a routine that ignores its parameters entirely, since it wants to keep a global lookahead variable:
# Get a line, combining continuation lines that start with whitespace sub get_line { my $thisline = $LOOKAHEAD; LINE: while ($LOOKAHEAD = <STDIN>) { if ($LOOKAHEAD =~ /^[ \t]/) { $thisline .= $LOOKAHEAD; } else { last LINE; } } $thisline; } $LOOKAHEAD = <STDIN>; # get first line while ($_ = get_line()) { ... }
Use list assignment to a private list to name your formal arguments:
sub maybeset { my($key, $value) = @_; $Foo{$key} = $value unless $Foo{$key}; }
This also has the effect of turning call-by-reference into call-by-value (to borrow some fancy terms from computer science), since the assignment copies the values.
Here's an example of not naming your formal arguments, so that you can modify your actual arguments:
upcase_in($v1, $v2); # this changes $v1 and $v2 sub upcase_in { for (@_) { tr/a-z/A-Z/ } }
You aren't allowed to modify constants in this way, of course. If an argument were actually a literal and you tried to change it, you'd take an exception (presumably fatal, possibly career-threatening). For example, this won't work:
upcase_in("frederick");
It would be much safer if the upcase_in()
function were written to
return a copy of its parameters instead of changing them in place:
($v3, $v4) = upcase($v1, $v2); sub upcase { my @parms = @_; for (@parms) { tr/a-z/A-Z/ } # wantarray checks whether we were called in list context return wantarray ? @parms : $parms[0]; }
Notice how this (unprototyped) function doesn't care whether it was passed
real scalars or arrays. Perl will see everything as one big, long, flat
@_
parameter list. This is one of the ways where Perl's simple
argument-passing style shines. The upcase
function will work
perfectly well without changing the upcase
definition even if we feed
it things like this:
@newlist = upcase(@list1, @list2); @newlist = upcase( split /:/, $var );
Do not, however, be tempted to do this:
(@a, @b) = upcase(@list1, @list2); # WRONG
Why not? Because, like the flat incoming parameter list, the return list is also
flat. So all you have managed to do here is store everything in
@a
and make @b
an empty list.
See the later section on "Passing References" for alternatives.
The official name of a subroutine includes the &
prefix. A
subroutine may be called using the prefix, but the &
is usually
optional, and so are the parentheses if the subroutine has been predeclared.
(Note, however, that the &
is not optional when you're
just naming the subroutine, such as when it's used as an argument to
defined or undef, or when you want to generate a reference
to a named subroutine by saying $subref = \&name
. Nor is the
&
optional when you want to do an indirect subroutine call with
a subroutine name or reference using the &$subref()
or
&{$subref}()
constructs. See Chapter 4 for more on that.)
Subroutines may be called recursively. If a subroutine is called using
the &
form, the argument list is optional, and if omitted, no @_
array is set up for the subroutine: the @_
array of the calling
routine at the time of the call is visible to called subroutine instead.
This is an efficiency mechanism that new users may wish to avoid.
&foo(1,2,3); # pass three arguments foo(1,2,3); # the same foo(); # pass a null list &foo(); # the same &foo; # foo() gets current args, like foo(@_) !! foo; # like foo() if sub foo pre-declared, else bareword "foo"
Not only does the &
form make the argument list optional, but it also
disables any prototype checking on the arguments you do provide. This
is partly for historical reasons, and partly for having a convenient way
to cheat if you know what you're doing. See the section on
"Prototypes" later in this chapter.
Any variables you use in the function that aren't declared private are global variables. For more on creating private variables, see my in Chapter 3.
Note that the mechanism described in this section was originally the only way to simulate pass-by-reference in older versions of Perl. While it still works fine in modern versions, the new reference mechanism is generally easier to work with. See below.
Sometimes you don't want to pass the value of an array to a subroutine
but rather the name of it, so that the subroutine can modify the global
copy of it rather than working with a local copy. In Perl you can
refer to all objects of a particular name by prefixing the name
with a star: *foo
. This is often known as a typeglob, since the
star on the front can be thought of as a wildcard match for all the
funny prefix characters on variables and subroutines and such.
When evaluated, a typeglob produces a scalar value that represents all the objects of that name, including any scalar, array, or hash variable, and also any filehandle, format, or subroutine. When assigned to, a typeglob sets up its own name to be an alias for whatever typeglob value was assigned to it. For example:
sub doubleary { local(*someary) = @_; foreach $elem (@someary) { $elem *= 2; } } doubleary(*foo); doubleary(*bar);
Note that scalars are already passed by reference, so you can modify
scalar arguments without using this mechanism by referring explicitly
to $_[0]
, and so on. You can modify all the elements of an array by passing
all the elements as scalars, but you have to use the *
mechanism (or
the equivalent reference mechanism described below)
to push, pop, or change the size of
an array. It will certainly be faster to pass the typeglob (or reference)
than to push a bunch of scalars onto the argument stack only to pop
them all back off again.
Even if you don't want to modify an array, this mechanism is useful for
passing multiple arrays in a single LIST
, since normally the
LIST
mechanism will flatten all the list values so that you can't extract out
the individual arrays.
If you want to pass more than one array or hash into or out of a function and have them maintain their integrity, then you're going to want to use an explicit pass-by-reference. Before you do that, you need to understand references as detailed in Chapter 4. This section may not make much sense to you otherwise. But hey, you can always look at the pictures.
Here are a few simple examples. First, let's pass in several arrays to a function and have it pop each of them, returning a new list of all their former last elements:
@tailings = popmany ( \@a, \@b, \@c, \@d ); sub popmany { my $aref; my @retlist = (); foreach $aref ( @_ ) { push @retlist, pop @$aref; } return @retlist; }
Here's how you might write a function that returns a list of keys occurring in all the hashes passed to it:
@common = inter( \%foo, \%bar, \%joe ); sub inter { my ($k, $href, %seen); # locals foreach $href (@_) { while ( ($k) = each %$href ) { $seen{$k}++; } } return grep { $seen{$_} == @_ } keys %seen; }
So far, we're just using the normal list return mechanism. What happens if you want to pass or return a hash? Well, if you're only using one of them, or you don't mind them concatenating, then the normal calling convention is OK, although a little expensive.
Where people get into trouble is here:
(@a, @b) = func(@c, @d);
or here:
(%a, %b) = func(%c, %d);
That syntax simply won't work. It just sets @a
or %a
and clears
@b
or %b
. Plus the function doesn't get two
separate arrays or hashes as arguments: it gets one long list in @_
,
as always.
If you can arrange for the function to receive references as its parameters and return them as its return results, it's cleaner code, although not so nice to look at. Here's a function that takes two array references as arguments, returning the two array references ordered according to how many elements they have in them:
($aref, $bref) = func(\@c, \@d); print "@$aref has more than @$bref\n"; sub func { my ($cref, $dref) = @_; if (@$cref > @$dref) { return ($cref, $dref); } else { return ($dref, $cref); } }
It turns out that you can actually mix the typeglob approach with the reference approach, like this:
(*a, *b) = func(\@c, \@d); print "@a has more than @b\n"; sub func { local (*c, *d) = @_; if (@c > @d) { return (\@c, \@d); } else { return (\@d, \@c); } }
Here we're using the typeglobs to do symbol table aliasing. It's a tad subtle, though, and also won't work if you're using my variables, since only globals (well, and locals) are in the symbol table. When you assign a reference to a typeglob like that, only the one element from the typeglob (in this case, the array element) is aliased, instead of all the similarly named elements, since the reference knows what it's referring to.
If you're passing around filehandles, you can usually just use the bare
typeglob, like *STDOUT
, but references to typeglobs work even better
because they still behave properly under use strict 'refs'
. For
example:
splutter(\*STDOUT); sub splutter { my $fh = shift; print $fh "her um well a hmmm\n"; } $rec = get_rec(\*STDIN); sub get_rec { my $fh = shift; return scalar <$fh>; }
If you're planning on generating new filehandles, see the open entry in Chapter 3 for an example using the FileHandle module.
As of the 5.003 release of Perl, you can declare your subroutines to take arguments just like many of the built-ins, that is, with certain constraints on the number and types of arguments. For instance, if you declare:
sub mypush (\@@)
then mypush
takes arguments exactly like push does. The
declaration of the function to be called must be visible at compile time.
The prototype only affects the interpretation of new-style calls to the
function, where new-style is defined as "not using the &
character".
In other words, if you call it like a built-in function, then it behaves
like a built-in function. If you call it like an old-fashioned subroutine,
then it behaves like an old-fashioned subroutine. It naturally falls out
from this rule that prototypes have no influence on subroutine references
like \&foo
or on indirect subroutine calls like &{$subref}
.
Method calls are not influenced by prototypes either. This is because the function to be called is indeterminate at compile-time, depending as it does on inheritance, which is dynamically determined in Perl.
Since the intent is primarily to let you define subroutines that work
like built-in commands, here are the prototypes for some other functions
that parse almost exactly like the corresponding built-ins. (Note that
the "my
" on the front of each is just part of the name we picked, and
has nothing to do with Perl my operator. You can name your prototyped
functions anything you like - we just picked our names to parallel the
built-in functions.)
Declared as | Called as |
---|---|
sub mylink ($$) | mylink $old, $new |
sub myvec ($$$) | myvec $var, $offset, 1 |
sub myindex ($$;$) | myindex &getstring, "substr" |
sub mysyswrite ($$$;$) | mysyswrite $buf, 0, length($buf) - $off, $off |
sub myreverse (@) | myreverse $a,$b,$c |
sub myjoin ($@) | myjoin ":",$a,$b,$c |
sub mypop (\@) | mypop @array |
sub mysplice (\@$$@) | mysplice @array,@array,0,@pushme |
sub mykeys (\%) | mykeys %{$hashref} |
sub myopen (*;$) | myopen HANDLE, $name |
sub mypipe (**) | mypipe READHANDLE, WRITEHANDLE |
sub mygrep (&@) | mygrep { /foo/ } $a,$b,$c |
sub myrand ($) | myrand 42 |
sub mytime () | mytime |
Any backslashed prototype character (shown between parentheses in the
left column above) represents an actual argument (exemplified in the
right column) that absolutely must start with that character. Just as
the first argument to keys must start
with %
, so too must the first argument to
mykeys
.
Unbackslashed prototype characters have special meanings. Any
unbackslashed @ or % eats all the rest of the actual arguments, and
forces list context. (It's equivalent to LIST
in
a syntax diagram.) An argument represented by $ forces scalar context
on it. An & requires an anonymous subroutine (which, if passed as
the first argument, does not require the "sub
" keyword or a subsequent
comma). And a *
does whatever it has to do to turn the argument into
a reference to a symbol table entry. It's typically used for
filehandles.
A semicolon separates mandatory arguments from optional arguments.
(It would be redundant before @
or %
, since lists can be null.)
Note how the last three examples above are treated specially by the
parser. mygrep
is parsed as a true list operator,
myrand
is parsed as a true unary operator with
unary precedence the same as rand, and
mytime
is truly argumentless, just like time.
That is, if you say:
mytime +2;
you'll get mytime() + 2
, not
mytime(2)
, which is how it would be parsed without
the prototype, or with a unary prototype.
The interesting thing about &
is that you can generate new
syntax with it:
sub try (&$) { my($try,$catch) = @_; eval { &$try }; if ($@) { local $_ = $@; &$catch; } } sub catch (&) { shift } try { die "phooey"; } catch { /phooey/ and print "unphooey\n"; };
This prints "unphooey
". What happens is that try
is called with two
arguments, the anonymous function {die "phooey";}
and the return value
of the catch
function, which in this case is nothing but its own
argument, the entire block of yet another anonymous function. Within
try
, the first function argument is called while protected within an
eval block to trap anything that blows up. If something does blow up, the
second function is called with a local version of the global $_
variable
set to the raised exception.[47]
If this all sounds like pure gobbledygook, you'll have to read about
die and eval in Chapter 3, and then go
check out anonymous functions in Chapter 4.
[47] Yes, there are still unresolved issues having to do with the visibility of
@_
. We're ignoring that question for the moment. (But note that if we make@_
lexically scoped someday, those anonymous subroutines can act like closures. (Gee, is this sounding a little Lispish? (Nevermind.)))
And here's a reimplementation of the grep operator (the built-in one is more efficient, of course):
sub mygrep (&@) { my $coderef = shift; my @result; foreach $_ (@_) { push(@result, $_) if &$coderef; } @result; }
Some folks would prefer to see full alphanumeric prototypes. Alphanumerics have been intentionally left out of prototypes for the express purpose of someday adding named, formal parameters. (Maybe.) The current mechanism's main goal is to let module writers provide better diagnostics for module users. Larry feels that the notation is quite understandable to Perl programmers, and that it will not intrude greatly upon the meat of the module, nor make it harder to read. The line noise is visually encapsulated into a small pill that's easy to swallow.
One note of caution. It's probably best to put prototypes on new functions, not retrofit prototypes onto older ones. That's because you must be especially careful about silently imposing a different context. Suppose, for example, you decide that a function should take just one parameter, like this:
sub func ($) { my $n = shift; print "you gave me $n\n"; }
and someone has been calling it with an array or expression returning a single-element list:
func(@foo); func( split /:/ );
Then you've just supplied an implicit scalar in front of their
argument, which can be more than a bit surprising. The old @foo
that used to hold one thing doesn't get passed in. Instead, 1 (the
number of elements in @foo
) is now passed to func
.
And the split gets called in a scalar context and
starts scribbling on your @_ parameter list.
But if you're careful, you can do a lot of neat things with prototypes. This is all very powerful, of course, and should only be used in moderation to make the world a better place.