If you have been paying close attention, you may have noticed that the example scripts in this chapter are all a little different from previous examples. The difference appears at the end of the first line. All of our prior examples have had this as the first line:
#!/usr/bin/perl -wT
In this chapter, they have started like this:
#!/usr/bin/perl -w
The difference is the -T option, which enables Perl's taint mode. Taint mode tells Perl to keep track of data that comes from the user and avoid doing anything insecure with it. Because our examples this chapter intentionally showed insecure ways of doing things, they wouldn't have worked with the -T flag, thus we omitted it. From this it should be clear, however, that taint mode is generally a very good thing.
The purpose of taint mode is to not allow any data from outside your application from affecting anything else external to your application. Thus, Perl will not allow user-inputted values to be used in an eval, passed through a shell, or used in any of the Perl commands that affect external files and processes. It was created for situations when security is important, such as writing Perl programs that run as root or CGI scripts. You should always use taint mode in your CGI scripts.
When taint mode is enabled, Perl monitors every variable to see if it is tainted. Tainted data, according to Perl, is any data that comes from outside your code. Because this includes anything read from STDIN (or any other file input) as well as all environment variables, this covers everything your CGI script receives from the user.
Not only does Perl keep track of whether variables are tainted or not, but that taintedness follows the data in the variable around if you try to assign it to another variable. For example, because it is an environment variable, Perl considers the HTTP request method stored in $ENV{REQUEST_METHOD} to be tainted. If you then assign this to another variable, that variable also becomes tainted.
my $method = $ENV{REQUEST_METHOD};
Here $method also becomes tainted. It does not matter whether the expression simple or complex. If a tainted value is used in an expression, then the result of that expression is also tainted, and any variable it is assigned to will also become tainted.
You can use this subroutine to test whether a variable is tainted.[16] It returns a true or false value:
[16]The perlsec manpage suggests a subroutine that uses Perl's kill function to test for taintedness. Unfortunately, the kill function is not supported by many systems. The subroutine provided here should work on any platform.
sub is_tainted { my $var = shift; my $blank = substr( $var, 0, 0 ); return not eval { eval "1 || $blank" || 1 }; }
We set $blank to a zero-length substring of the variable we're testing. If the value is tainted and we are running in taint mode, Perl will throw an error when we evaluate this in the quoted expression on the following line. This error is caught by the outer eval, which then returns undef. If the variable is not tainted or we are not running in taint mode, then the expression within the outer eval evaluates to 1. The not reverses the resulting values.
One of the great benefits of using taint mode is that you don't have to try to understand all the technical details about how Perl's guts do the work. As we have seen, Perl sometimes passes expressions through an external shell to help it interpret arguments to system calls. There are even more subtle situations when Perl will invoke a shell, but you don't need to worry about mapping all of these instances out, because taint mode recognizes them for you.
The basic rule, as we have said, is that Perl considers any action that could modify resources outside the script subject to enforcement. Thus, you may open a file using a tainted filename and read from it as long as you did so in read-only mode. However, if you try to open the file to write to it, using a tainted filename, Perl will abort with an error.
Taint mode would be much too restrictive if there was no way to untaint your data. Of course, you do not want to untaint data without checking it to verify that it is safe. Fortunately, one command can accomplish both of these tasks. It turns out that Perl does allow one expression involving tainted values to evaluate to an untainted value. If you match a variable with a regular expression, then the pattern match variables that correspond to the matched parentheses (e.g., $1, $2, etc.) are untainted. If, for example, you wanted to get a particular filename for the user while making sure that it doesn't include a full path (so the user cannot write to a file outside the directory you are intending), you could untaint the user input this way:
$q->param( "filename" ) =~ /^([\w+.])$/; my $filename = $1; unless ( $filename ) { . . .
You can reduce the first two lines to one line because a regular expression match returns a list of matches, and these are also untainted:
my( $filename ) = $q->param( "filename" ) =~ /^([\w.])$/; unless ( $filename ) { . . .
You have seen this notation previously in many of our examples. Note that because the result of the regular expression is a list, you must include parentheses around $filename to evaluate it in a list context. Otherwise, $filename will be set to the number of successful parenthesized matches (1 in this case).
Remember what we said previously. It is generally better to determine what characters to allow than to try to determine what not to allow. Build your untaint regular expressions with this in mind. In this example, we only allowed letters, numbers, underscores, and periods in the filename, which is much simpler than scanning against possible file path delimiters.
Perl's taint mode doesn't do anything for you that you can't do for yourself. It simply monitors the data and stops you if you're in danger of shooting yourself in the foot. You could be careful on your own, but it certainly helps to have Perl do its best to help. In general, the best argument for using taint mode is simply turn the question around and ask "Why not use taint mode?"
Many CGI developers can come up with excuses for not using taint mode, but none of them really hold water. Some may find it too difficult or complicated to deal with the restrictions that taint mode imposes. This is generally because they don't fully understand how taint mode works and they find it easier to turn it off than to learn how to fix the problems Perl is trying to point out (see the next section for some help).
Other developers may argue that taint mode slows their scripts down more than they can afford. Believe it or not, taint mode does not significantly slow down your scripts. If you are concerned about performance, don't implicitly assume that taint mode must slow down your code. Use the Benchmark module and test the difference; you may be surprised at the results. We'll discuss how to use the Benchmark module in Chapter 17, "Efficiency and Optimization".
The final reason to use taint mode is that CGI scripts rarely remain unchanged. Bugs are fixed, new features are added, and even though the original code may have been perfectly safe, someone may accidentally change all that. You can think of taint mode as an ongoing security audit that Perl provides for free.
When you first start working with taint mode, it can be annoying because it seems to complain about everything. Of course, once you have gained a little experience, you learn what to watch out for and begin to write safe code without having to think about it.
Here are some basic tips to help you with the major problems you will first encounter:
Your PATH must be secure. If you call any external programs, you must make sure that $ENV{PATH} does not contain any directories that can be modified by anyone other than their owner. Perl doesn't care if you provide the full path to the program you are calling or not; the PATH must still be secure since the program you are invoking may use the inherited PATH.
@INC will not include the current working directory. If your script needs to require or use other Perl code in the current directory, you must explicitly add the current directory to @INC or include the full or relative path to your code in the command.
Get rid of risky environment variables you don't need. In particular, delete IFS, CDPATH, ENV, and BASH_ENV.
It's common to add something like these two lines to CGI scripts running in taint mode (the PATH you choose may vary depending on your needs and your system):
$ENV{PATH} = "/bin:/usr/bin"; delete @ENV{ 'IFS', 'CDPATH', 'ENV', 'BASH_ENV' };
Copyright © 2001 O'Reilly & Associates. All rights reserved.