Programming Perl, Second Edition

Previous Chapter 2 Next
 

2.7 Subroutines

Like many languages, Perl provides for user-defined subroutines. (We'll also call them functions, but functions are the same thing as subroutines in Perl.) These subroutines may be defined anywhere in the main program, loaded in from other files via the do, require, or use keywords, or even generated on the fly using eval. You can generate anonymous subroutines, accessible only through references. You can even call a subroutine indirectly using a variable containing either its name or a reference to the routine.

To declare a subroutine, use one of these forms:

sub NAME;              # A "forward" declaration.
sub NAME (PROTO);      # Ditto, but with prototype.

To declare and define a subroutine, use one of these forms:

sub NAME BLOCK         # A declaration and a definition.
sub NAME (PROTO) BLOCK # Ditto, but with prototype.

To define an anonymous subroutine or closure at run-time, use a statement like:

$subref = sub BLOCK;

To import subroutines defined in another package, say:

use PACKAGE qw(NAME1 NAME2 NAME3...);

To call subroutines directly:

NAME(LIST);            # & is optional with parentheses.
NAME LIST;             # Parens optional if predeclared/imported.
&NAME;                 # Passes current @_ to subroutine.

To call subroutines indirectly (by name or by reference):

&$subref(LIST);        # & is not optional on indirect call.
&$subref;              # Passes current @_ to subroutine.

The Perl model for passing data into and out of a subroutine is simple: all function parameters are passed as one single, flat list of scalars, and all return values are likewise returned to the caller as one single, flat list of scalars. As with any LIST, any arrays or hashes passed in these lists will interpolate their values into the flattened list, losing their identities--but there are several ways to get around this, and the automatic list interpolation is frequently quite useful. Both parameter lists and return lists may contain as many or as few scalar elements as you'd like (though you may put constraints on the parameter list using prototypes). Indeed, Perl is designed around this notion of variadic functions (those taking any number of arguments), unlike C, where they're sort of grudgingly kludged in so that you can call printf (3).

Now, if you're going to design a language around the notion of passing varying numbers of arbitrary arguments, you'd better make it easy to process those arbitrary lists of arguments. In the interests of dealing with the function parameters as a list, any arguments passed to a Perl routine come in as the array @_. If you call a function with two arguments, those would be stored in $_[0] and $_[1]. Since @_ is an array, you can use any array operations you like on the parameter list. (This is an area where Perl is more orthogonal than the typical computer language.) The array @_ is a local array, but its values are implicit references to the actual scalar parameters. Thus you can modify the actual parameters if you modify the corresponding element of @_. (This is rarely done, however, since it's so easy to return interesting values in Perl.)

The return value of the subroutine (or of any other block, for that matter) is the value of the last expression evaluated. Or you may use an explicit return statement to specify the return value and exit the subroutine from any point in the subroutine. Either way, as the subroutine is called in a scalar or list context, so also is the final expression of the routine evaluated in the same scalar or list context.

Perl does not have named formal parameters, but in practice all you do is assign the contents of @_ to a my list, which serves nicely for a list of formal parameters. But you don't have to, which is the whole point of the @_ array.

For example, to calculate a maximum, the following routine just iterates over @_ directly:

sub max {
    my $max = shift(@_);
    foreach $foo (@_) {
        $max = $foo if $max < $foo;
    }
    return $max;
}
$bestday = max($mon,$tue,$wed,$thu,$fri);

Here's a routine that ignores its parameters entirely, since it wants to keep a global lookahead variable:

# Get a line, combining continuation lines that start with whitespace
sub get_line {
    my $thisline = $LOOKAHEAD;
    LINE: while ($LOOKAHEAD = <STDIN>) {
        if ($LOOKAHEAD =~ /^[ \t]/) {
            $thisline .= $LOOKAHEAD;
        }
        else {
            last LINE;
        }
    }
    $thisline;
}
$LOOKAHEAD = <STDIN>;       # get first line
while ($_ = get_line()) {
    ...
}

Use list assignment to a private list to name your formal arguments:

sub maybeset {
    my($key, $value) = @_;
    $Foo{$key} = $value unless $Foo{$key};
}

This also has the effect of turning call-by-reference into call-by-value (to borrow some fancy terms from computer science), since the assignment copies the values.

Here's an example of not naming your formal arguments, so that you can modify your actual arguments:

upcase_in($v1, $v2);  # this changes $v1 and $v2
sub upcase_in {
    for (@_) { tr/a-z/A-Z/ } 
}

You aren't allowed to modify constants in this way, of course. If an argument were actually a literal and you tried to change it, you'd take an exception (presumably fatal, possibly career-threatening). For example, this won't work:

upcase_in("frederick");

It would be much safer if the upcase_in( ) function were written to return a copy of its parameters instead of changing them in place:

($v3, $v4) = upcase($v1, $v2);
sub upcase {
    my @parms = @_;
    for (@parms) { tr/a-z/A-Z/ } 
    # wantarray checks whether we were called in list context
    return wantarray ? @parms : $parms[0];
}

Notice how this (unprototyped) function doesn't care whether it was passed real scalars or arrays. Perl will see everything as one big, long, flat @_ parameter list. This is one of the ways where Perl's simple argument-passing style shines. The upcase function will work perfectly well without changing the upcase definition even if we feed it things like this:

@newlist   = upcase(@list1, @list2);
@newlist   = upcase( split /:/, $var );

Do not, however, be tempted to do this:

(@a, @b)   = upcase(@list1, @list2);   # WRONG

Why not? Because, like the flat incoming parameter list, the return list is also flat. So all you have managed to do here is store everything in @a and make @b an empty list. See the later section on "Passing References" for alternatives.

The official name of a subroutine includes the & prefix. A subroutine may be called using the prefix, but the & is usually optional, and so are the parentheses if the subroutine has been predeclared. (Note, however, that the & is not optional when you're just naming the subroutine, such as when it's used as an argument to defined or undef, or when you want to generate a reference to a named subroutine by saying $subref = \&name. Nor is the & optional when you want to do an indirect subroutine call with a subroutine name or reference using the &$subref() or &{$subref}() constructs. See Chapter 4, References and Nested Data Structures for more on that.)

Subroutines may be called recursively. If a subroutine is called using the & form, the argument list is optional, and if omitted, no @_ array is set up for the subroutine: the @_ array of the calling routine at the time of the call is visible to called subroutine instead. This is an efficiency mechanism that new users may wish to avoid.

&foo(1,2,3);    # pass three arguments
foo(1,2,3);     # the same
foo();          # pass a null list
&foo();         # the same
&foo;           # foo() gets current args, like foo(@_) !!
foo;            # like foo() if sub foo pre-declared, else bareword "foo"

Not only does the & form make the argument list optional, but it also disables any prototype checking on the arguments you do provide. This is partly for historical reasons, and partly for having a convenient way to cheat if you know what you're doing. See the section on "Prototypes" later in this chapter.

Any variables you use in the function that aren't declared private are global variables. For more on creating private variables, see my in Chapter 3, Functions.

Passing Symbol Table Entries (Typeglobs)

Note that the mechanism described in this section was originally the only way to simulate pass-by-reference in older versions of Perl. While it still works fine in modern versions, the new reference mechanism is generally easier to work with. See below.

Sometimes you don't want to pass the value of an array to a subroutine but rather the name of it, so that the subroutine can modify the global copy of it rather than working with a local copy. In Perl you can refer to all objects of a particular name by prefixing the name with a star: *foo. This is often known as a typeglob, since the star on the front can be thought of as a wildcard match for all the funny prefix characters on variables and subroutines and such.

When evaluated, a typeglob produces a scalar value that represents all the objects of that name, including any scalar, array, or hash variable, and also any filehandle, format, or subroutine. When assigned to, a typeglob sets up its own name to be an alias for whatever typeglob value was assigned to it. For example:

sub doubleary {
    local(*someary) = @_;
    foreach $elem (@someary) {
        $elem *= 2;
    }
}
doubleary(*foo);
doubleary(*bar);

Note that scalars are already passed by reference, so you can modify scalar arguments without using this mechanism by referring explicitly to $_[0], and so on. You can modify all the elements of an array by passing all the elements as scalars, but you have to use the * mechanism (or the equivalent reference mechanism described below) to push, pop, or change the size of an array. It will certainly be faster to pass the typeglob (or reference) than to push a bunch of scalars onto the argument stack only to pop them all back off again.

Even if you don't want to modify an array, this mechanism is useful for passing multiple arrays in a single LIST, since normally the LIST mechanism will flatten all the list values so that you can't extract out the individual arrays.

Passing References

If you want to pass more than one array or hash into or out of a function and have them maintain their integrity, then you're going to want to use an explicit pass-by-reference. Before you do that, you need to understand references as detailed in Chapter 4, References and Nested Data Structures. This section may not make much sense to you otherwise. But hey, you can always look at the pictures.

Here are a few simple examples. First, let's pass in several arrays to a function and have it pop each of them, returning a new list of all their former last elements:

@tailings = popmany ( \@a, \@b, \@c, \@d );
sub popmany {
    my $aref;
    my @retlist = ();
    foreach $aref ( @_ ) {
        push @retlist, pop @$aref;
    } 
    return @retlist;
}

Here's how you might write a function that returns a list of keys occurring in all the hashes passed to it:

@common = inter( \%foo, \%bar, \%joe ); 
sub inter {
    my ($k, $href, %seen); # locals
    foreach $href (@_) {
        while ( ($k) = each %$href ) {
            $seen{$k}++;
        } 
    } 
    return grep { $seen{$_} == @_ } keys %seen;
}

So far, we're just using the normal list return mechanism. What happens if you want to pass or return a hash? Well, if you're only using one of them, or you don't mind them concatenating, then the normal calling convention is OK, although a little expensive.

Where people get into trouble is here:

(@a, @b) = func(@c, @d);

or here:

(%a, %b) = func(%c, %d);

That syntax simply won't work. It just sets @a or %a and clears @b or %b. Plus the function doesn't get two separate arrays or hashes as arguments: it gets one long list in @_, as always.

If you can arrange for the function to receive references as its parameters and return them as its return results, it's cleaner code, although not so nice to look at. Here's a function that takes two array references as arguments, returning the two array references ordered according to how many elements they have in them:

($aref, $bref) = func(\@c, \@d);
print "@$aref has more than @$bref\n";
sub func {
    my ($cref, $dref) = @_;
    if (@$cref > @$dref) {
        return ($cref, $dref);
    } else {
        return ($dref, $cref);
    } 
}

It turns out that you can actually mix the typeglob approach with the reference approach, like this:

(*a, *b) = func(\@c, \@d);
print "@a has more than @b\n";
sub func {
    local (*c, *d) = @_;
    if (@c > @d) {
        return (\@c, \@d);
    } else {
        return (\@d, \@c);
    } 
}

Here we're using the typeglobs to do symbol table aliasing. It's a tad subtle, though, and also won't work if you're using my variables, since only globals (well, and locals) are in the symbol table. When you assign a reference to a typeglob like that, only the one element from the typeglob (in this case, the array element) is aliased, instead of all the similarly named elements, since the reference knows what it's referring to.

If you're passing around filehandles, you can usually just use the bare typeglob, like *STDOUT, but references to typeglobs work even better because they still behave properly under use strict 'refs'. For example:

splutter(\*STDOUT);
sub splutter {
    my $fh = shift;
    print $fh "her um well a hmmm\n";
}
$rec = get_rec(\*STDIN);
sub get_rec {
    my $fh = shift;
    return scalar <$fh>;
}

If you're planning on generating new filehandles, see the open entry in Chapter 3, Functions for an example using the FileHandle module.

Prototypes

As of the 5.003 release of Perl, you can declare your subroutines to take arguments just like many of the built-ins, that is, with certain constraints on the number and types of arguments. For instance, if you declare:

sub mypush (\@@)

then mypush takes arguments exactly like push does. The declaration of the function to be called must be visible at compile time. The prototype only affects the interpretation of new-style calls to the function, where new-style is defined as "not using the & character". In other words, if you call it like a built-in function, then it behaves like a built-in function. If you call it like an old-fashioned subroutine, then it behaves like an old-fashioned subroutine. It naturally falls out from this rule that prototypes have no influence on subroutine references like \&foo or on indirect subroutine calls like &{$subref}.

Method calls are not influenced by prototypes either. This is because the function to be called is indeterminate at compile-time, depending as it does on inheritance, which is dynamically determined in Perl.

Since the intent is primarily to let you define subroutines that work like built-in commands, here are the prototypes for some other functions that parse almost exactly like the corresponding built-ins. (Note that the "my" on the front of each is just part of the name we picked, and has nothing to do with Perl my operator. You can name your prototyped functions anything you like--we just picked our names to parallel the built-in functions.)

Declared as Called as
sub mylink ($$) mylink $old, $new
sub myvec ($$$) myvec $var, $offset, 1
sub myindex ($$;$) myindex &getstring, `substr`
sub mysyswrite ($$$;$) mysyswrite $buf, 0, length($buf) - $off, $off
sub myreverse (@) myreverse $a,$b,$c
sub myjoin ($@) myjoin `:`,$a,$b,$c
sub mypop (\@) mypop @array
sub mysplice (\@$$@) mysplice @array,@array,0,@pushme
sub mykeys (\%) mykeys %{$hashref}
sub myopen (*;$) myopen HANDLE, $name
sub mypipe (**) mypipe READHANDLE, WRITEHANDLE
sub mygrep (&@) mygrep { /foo/ } $a,$b,$c
sub myrand ($) myrand 42
sub mytime () mytime

Any backslashed prototype character (shown between parentheses in the left column above) represents an actual argument (exemplified in the right column) that absolutely must start with that character. Just as the first argument to keys must start with %, so too must the first argument to mykeys.

Unbackslashed prototype characters have special meanings. Any unbackslashed @ or % eats all the rest of the actual arguments, and forces list context. (It's equivalent to LIST in a syntax diagram.) An argument represented by $ forces scalar context on it. An & requires an anonymous subroutine (which, if passed as the first argument, does not require the "sub" keyword or a subsequent comma). And a * does whatever it has to do to turn the argument into a reference to a symbol table entry. It's typically used for filehandles.

A semicolon separates mandatory arguments from optional arguments. (It would be redundant before @ or %, since lists can be null.)

Note how the last three examples above are treated specially by the parser. mygrep is parsed as a true list operator, myrand is parsed as a true unary operator with unary precedence the same as rand, and mytime is truly argumentless, just like time.

That is, if you say:

mytime +2;

you'll get mytime() + 2, not mytime(2), which is how it would be parsed without the prototype, or with a unary prototype.

The interesting thing about & is that you can generate new syntax with it:

sub try (&$) {
    my($try,$catch) = @_;
    eval { &$try };
    if ($@) {
        local $_ = $@;
        &$catch;
    }
}
sub catch (&) { shift }
try {
    die "phooey";
} catch {
    /phooey/ and print "unphooey\n";
};

This prints "unphooey". What happens is that try is called with two arguments, the anonymous function {die `phooey`;} and the return value of the catch function, which in this case is nothing but its own argument, the entire block of yet another anonymous function. Within try, the first function argument is called while protected within an eval block to trap anything that blows up. If something does blow up, the second function is called with a local version of the global $_ variable set to the raised exception.[47] If this all sounds like pure gobbledygook, you'll have to read about die and eval in Chapter 3, Functions, and then go check out anonymous functions in Chapter 4, References and Nested Data Structures.

[47] Yes, there are still unresolved issues having to do with the visibility of @_. We're ignoring that question for the moment. (But note that if we make @_ lexically scoped someday, those anonymous subroutines can act like closures. (Gee, is this sounding a little Lispish? (Nevermind.)))

And here's a reimplementation of the grep operator (the built-in one is more efficient, of course):

sub mygrep (&@) {
    my $coderef = shift;
    my @result;
    foreach $_ (@_) {
        push(@result, $_) if &$coderef;
    }
    @result;
}

Some folks would prefer to see full alphanumeric prototypes. Alphanumerics have been intentionally left out of prototypes for the express purpose of someday adding named, formal parameters. (Maybe.) The current mechanism's main goal is to let module writers provide better diagnostics for module users. Larry feels that the notation is quite understandable to Perl programmers, and that it will not intrude greatly upon the meat of the module, nor make it harder to read. The line noise is visually encapsulated into a small pill that's easy to swallow.

One note of caution. It's probably best to put prototypes on new functions, not retrofit prototypes onto older ones. That's because you must be especially careful about silently imposing a different context. Suppose, for example, you decide that a function should take just one parameter, like this:

sub func ($) {
    my $n = shift;
    print "you gave me $n\n";
}

and someone has been calling it with an array or expression returning a single-element list:

func(@foo);
func( split /:/ );

Then you've just supplied an implicit scalar in front of their argument, which can be more than a bit surprising. The old @foo that used to hold one thing doesn't get passed in. Instead, 1 (the number of elements in @foo) is now passed to func. And the split gets called in a scalar context and starts scribbling on your @_ parameter list.

But if you're careful, you can do a lot of neat things with prototypes. This is all very powerful, of course, and should only be used in moderation to make the world a better place.


Previous Home Next
Statements and Declarations Book Index Formats