While most of the work of programming may be simply getting a program
working properly, you may find yourself wanting more bang for the buck
out of your Perl program. Perl's rich set of operators, datatypes, and
control constructs are not necessarily intuitive when it comes to speed
and space optimization. Many trade-offs were made during Perl's design,
and such decisions are buried in the guts of the code. In general, the
shorter and simpler your code is, the faster it runs, but there are
exceptions. This section attempts to help you make it work just a wee
bit better.
(If you want it to work a lot better, you can play with the new Perl-to-C
translation modules, or rewrite your inner loop as a C extension.)
You'll note that sometimes optimizing for time may cost you in space or
programmer efficiency (indicated by conflicting hints below). Them's
the breaks. If programming were easy, they wouldn't need something as
complicated as a human being to do it, now would they?
- Use hashes instead of linear searches.
For example, instead of searching through @keywords to see if
$_ is a keyword, construct a hash with:
my %keywords;
for (@keywords) {
$keywords{$_}++;
}
Then, you can quickly tell if $_ contains a keyword by testing
$keyword{$_} for a non-zero value.
-
Avoid subscripting when a foreach or list operator will do. Subscripting
sometimes forces conversion from floating point to integer, and
there's often a better way to do it. Consider using foreach, shift,
and splice operations. Consider saying
use integer.
- Avoid goto.
It scans outward from your current location for the indicated label.
- Avoid printf if print will work.
Quite apart from the extra overhead of printf, some
implementations have field length limitations that print gets
around.
- Avoid $&, $`,
and $'.
Any occurrence in your program causes all matches to save the searched
string for possible future reference. (However, once you've blown it, it
doesn't hurt to have more of them.)
- Avoid using eval on a string. An eval of a string (not of a
BLOCK) forces recompilation every time through. The
Perl parser is pretty fast for a parser, but that's not saying much. Nowadays
there's almost always a better way to do what you want anyway. In particular,
any code that uses eval merely to construct
variable names is obsolete, since you can now do the same directly using
symbolic references:
${$pkg . '::' . $varname} = &{ "fix_" . $varname }($pkg);
- Avoid string eval inside a loop.
Put the loop into the eval instead, to avoid redundant
recompilations of the code. See the study operator
in Chapter 3, Functions for an example of this.
-
Avoid run-time-compiled patterns. Use the
/pattern/o
(once only) pattern modifier to avoid pattern recompilation when the
pattern doesn't change over the life of the process.
For patterns that change
occasionally, you can use the fact that a null pattern refers back to
the previous pattern, like this:
"foundstring" =~ /$currentpattern/; # Dummy match (must succeed).
while (<>) {
print if //;
}
You can also use eval to recompile a subroutine that does the match (if
you only recompile occasionally).
- Short-circuit alternation is often faster than the corresponding
regular expression. So:
print if /one-hump/ || /two/;
is likely to be faster than:
at least for certain values of one-hump and two.
This is because the optimizer likes to hoist certain simple matching
operations up into higher parts of the syntax tree and do very fast
matching with a Boyer-Moore algorithm. A complicated pattern defeats
this.
- Reject common cases early with next if.
As with simple regular expressions, the optimizer likes this. And it just
makes sense to avoid unnecessary work. You can typically discard comment
lines and blank lines even before you do a split or chop:
while (<>) {
next if /^#/;
next if /^$/;
chop;
@piggies = split(/,/);
...
}
-
Avoid regular expressions with many quantifiers, or with big
{m,n}
numbers on parenthesized expressions. Such patterns can result in
exponentially slow backtracking behavior unless the quantified
subpatterns match on their first "pass".
- Try to maximize the length of any non-optional literal strings in
regular expressions. This is counterintuitive, but longer patterns
often match faster than shorter patterns. That's because the
optimizer looks for constant strings and hands them off to a
Boyer-Moore search, which benefits from longer strings. Compile your
pattern with the -Dr debugging switch to see what
Perl thinks the longest literal string is.
- Avoid expensive subroutine calls in tight loops.
There is overhead associated with calling subroutines, especially when
you pass lengthy parameter lists, or return lengthy values. In
increasing order of desperation, try passing values by reference,
passing values as dynamically scoped globals, inlining the subroutine,
or rewriting the whole loop in C.
- Avoid getc for anything but single-character terminal I/O.
In fact, don't use it for that either. Use sysread.
- Use readdir rather than <*>.
To get all the non-dot files within a directory, say something like:
opendir(DIR,".");
@files = sort grep(!/^\./, readdir(DIR));
closedir(DIR);
- Avoid frequent substr on long strings.
- Use pack and unpack
instead of multiple substr
invocations.
- Use substr as an lvalue rather than
concatenating substrings. For example, to replace the fourth through sixth
characters of $foo with the contents of the variable
$bar, don't do:
$foo = substr($foo,0,3) . $bar . substr($foo,6);
Instead, simply identify the part of the string to be replaced,
and assign into it, as in:
But be aware that if $foo is a huge string, and $bar
isn't exactly 3 characters long, this can do a lot of copying too.
- Use s/// rather than concatenating substrings.
This is especially true if you can replace one constant with another of
the same size. This results in an in-place substitution.
- Use modifiers and equivalent and and
or, instead of
full-blown conditionals.
Statement modifiers and logical operators avoid the overhead of entering
and leaving a block. They can often be more readable too.
- Use $foo = $a || $b || $c.
This is much faster (and shorter to say) than:
if ($a) {
$foo = $a;
}
elsif ($b) {
$foo = $b;
}
elsif ($c) {
$foo = $c;
}
Similarly, set default values with:
- Group together any tests that want the same initial string.
When testing a string for various prefixes in anything resembling a
switch structure, put together all the /^a/ patterns, all the
/^b/ patterns, and so on.
- Don't test things you know won't match.
Use last or elsif
to avoid falling through to the next
case in your switch statement.
- Use special operators like study, logical string operations,
unpack 'u' and pack '%' formats.
- Beware of the tail wagging the dog.
Misstatements resembling (<STDIN>)[0] and 0
.. 2000000 can
cause Perl much unnecessary work. In accord with UNIX philosophy, Perl
gives you enough rope to hang yourself.
- Factor operations out of loops. The Perl optimizer does not attempt to
remove invariant code from loops. It expects you to exercise some sense.
- Slinging strings can be faster than slinging arrays.
- Slinging arrays can be faster than slinging strings.
It all depends on whether you're going to reuse the strings or arrays,
and on which operations you're going to perform. Heavy modification of each
element implies that arrays will be better, and occasional modification of
some elements implies that strings will be better. But you just have to
try it and see.
- my variables are normally
faster than local variables.
- Sorting on a manufactured key array may be faster than using a fancy sort
subroutine.
A given array value may participate in several sort comparisons, so if
the sort subroutine has to do much recalculation, it's better to
factor out that calculation to a separate pass before the actual sort.
- tr/abc//d is faster than s/[abc]//g.
- print
with a comma separator may be faster than concatenating strings.
For example:
print $fullname{$name} . " has a new home directory " .
$home{$name} . "\n";
has to glue together the two hashes and the two
fixed strings before passing them to the low-level print routines, whereas:
print $fullname{$name}, " has a new home directory ",
$home{$name}, "\n";
doesn't. On the other hand, depending on the values and the architecture,
the concatenation may be faster. Try it.
- Prefer join(``, ...) to a series of concatenated strings.
Multiple concatenations may cause strings to be copied back and
forth multiple times. The join operator avoids this.
- split on a fixed string is generally faster than
split on a
pattern.
That is, use split(/ /,...) rather than
split(/ +/,...) if you know there will only be one space.
However, the patterns /\s+/, /^/ and / / are
specially optimized, as is the split on whitespace.
- Pre-extending an array or string can save some time.
As strings and arrays grow, Perl extends them by allocating a new copy
with some room for growth and copying in the old value. Pre-extending a
string with the x operator or an array by setting $#array
can prevent this occasional overhead, as well as minimize memory
fragmentation.
- Don't undef long strings and arrays if they'll be reused for the
same purpose.
This helps prevent reallocation when the string or array must be re-extended.
- Prefer `\0` x 8192 over unpack(`x8192`,()).
- system(`mkdir...`) may be faster on multiple directories if
mkdir (2) isn't available.
- Avoid using eof if return values will already indicate it.
- Cache entries from passwd and group (and so on) that are apt to be reused.
For example, to cache the return value from gethostbyaddr when
you are converting numeric addresses (like 198.112.208.11) to names
(like "www.ora.com"), you can use something like:
sub numtoname {
local($_) = @_;
unless (defined $numtoname{$_}) {
local(@a) = gethostbyaddr(pack('C4', split(/\./)),2);
$numtoname{$_} = @a > 0 ? $a[0] : $_;
}
$numtoname{$_};
}
- Avoid unnecessary system calls.
Operating system calls tend to be rather expensive. So for example,
don't call the time operator when a cached value of $now
would do. Use the special _ filehandle to avoid unnecessary
stat (2) calls. On some systems, even a minimal system call may
execute a thousand instructions.
- Avoid unnecessary system calls.
The system operator has to fork a subprocess and execute the
program you specify. Or worse, execute a shell to execute the program
you specify. This can easily execute a million instructions.
- Worry about starting subprocesses, but only if they're frequent.
Starting a single pwd, hostname, or find process isn't
going to hurt you much--after all, a shell starts subprocesses all day
long. We do occasionally encourage the toolbox approach, believe it or not.
- Keep track of your working directory yourself rather than calling
pwd repeatedly.
(A package is provided in the standard library for this.
See the Cwd module in Chapter 7, The Standard Perl Library.)
- Avoid shell metacharacters in commands--pass lists to system and
exec where appropriate.
- Set the sticky bit on the Perl interpreter on machines without demand paging.
- Using defaults doesn't make your program faster.
- Use vec for compact integer array storage.
- Prefer numeric values over string values--they require little additional
space over that allocated for the scalar header structure.
- Use substr to store constant-length strings in a longer string.
- Use the Tie::SubstrHash module for very compact storage of a hash array,
if the key and value lengths are fixed.
- Use _ _END_ _ and the DATA
filehandle to avoid storing program data as both a string and an array.
- Prefer each to keys where order doesn't matter.
- Delete or undef globals that are no longer in use.
- Use some kind of DBM to store hashes.
- Use temp files to store arrays.
- Use pipes to offload processing to other tools.
- Avoid list operations and file slurps.
- Avoid using tr///, each of which must store a translation
table of 256 short integers (not characters, since we have to remember
which characters are to be deleted).
- Don't unroll your loops or inline your subroutines.
- Use defaults.
- Use funky shortcut command-line switches like
-a, -n, -p, -s, -i.
- Use for to mean foreach.
- Sling UNIX commands around with backticks.
- Use <*> and such.
- Use run-time-compiled patterns.
- Use patterns with lots of *, +,
and {}.
- Sling whole arrays and slurp entire files.
- Use getc.
- Use $&, $`, and $'.
- Don't check error values on open, since
<HANDLE>
and print HANDLE will simply
no-op when given an invalid handle.
- Don't close your files--they'll be
closed on the next open.
- Pass subroutine arguments as globals.
- Don't name your subroutine parameters.
You can access them directly as
$_[EXPR].
- Use whatever you think of first.
- Don't use defaults.
- Use foreach to mean foreach.
- Use meaningful loop labels with next and last.
- Use meaningful variable names.
- Use meaningful subroutine names.
- Put the important thing first on the line using and, or,
and statement modifiers.
- Close your files as soon as you're done with them.
- Use packages, modules, and classes to hide your implementation details.
- Pass arguments as subroutine parameters.
- Name your subroutine parameters using my.
- Parenthesize for clarity.
- Put in lots of (useful) comments.
- Write the script as its own POD document.
- Wave a handsome tip under his nose.
- Avoid functions that aren't implemented everywhere.
You can use eval tests to see what's available.
- Don't expect native float and double to pack and unpack on
foreign machines.
- Use network byte order when sending binary data over the network.
- Don't send binary data over the network.
- Check $] to see if the current version supports all the features
you use.
- Don't use $]: use require with a version number.
- Put in the eval exec hack even if you don't use it.
- Put the #!/usr/bin/perl line in even if you don't use it.
- Test for variants of UNIX commands.
Some finds can't handle -xdev, for example.
- Avoid variant UNIX commands if you can do it internally.
UNIX commands don't work too well on MS-DOS or VMS.
- Use the Config module or the $^O variable to find out what kind of
machine you're running on.
- Put all your scripts and manpages into a single NFS filesystem that's
mounted everywhere.
- Avoid forcing prompt order--pop users into their favorite editor with a form.
- Better yet, use a GUI like the Perl Tk extension, where users can control the order of events.
- Put up something for users to read while you continue doing work.
- Use autoloading so that the program appears to run faster.
- Give the option of helpful messages at every prompt.
- Give a helpful usage message if users don't give correct input.
- Display the default action at every prompt, and maybe a few alternatives.
- Choose defaults for beginners. Allow experts to change the defaults.
- Use single character input where it makes sense.
- Pattern the interaction after other things the user is familiar with.
- Make error messages clear about what needs fixing. Include all
pertinent information such as filename and errno, like this:
open(FILE, $file) or die "$0: Can't open $file for reading: $!\n";
- Use fork and exit
to detach when the rest of the script is batch processing.
- Allow arguments to come either from the command line or via standard
input.
- Use text-oriented network protocols.
- Don't put arbitrary limitations into your program.
- Prefer variable-length fields over fixed-length fields.
- Be vicariously lazy.
- Be nice.