Before you can run CGI programs on your server, certain parameters in the server configuration files must be modified. Throughout this book, we will use the Apache web server on a Unix platform in our examples. Apache is by far the most popular web server available, plus it's open source and available for free. Apache is derived from the NCSA web server, so many configuration details for it are similar to those for other web servers that are also derived from the NCSA server, such as those sold by iPlanet (formerly Netscape).
We assume that you already have access to a working web server, so we won't cover how to install and initially configure Apache. That lengthy discussion would be well beyond the scope of this book, and that information is already available in another fine book, Apache: The Definitive Guide, by Ben and Peter Laurie (O'Reilly & Associates, Inc.).
Apache is not always installed in the same place on all systems. Throughout this book, we will use the default installation path, which places everything beneath /usr/local/apache. Apache's subdirectories are:
$ cd /usr/local/apache $ ls -F bin/ cgi-bin/ conf/ htdocs/ icons/ include/ libexec/ logs/ man/ proxy/
Depending on how Apache was configured during installation, you may not have some directories, such as libexec or proxy ; this is fine. With some popular Unix and Unix-compatible distributions that include Apache (e.g., some Linux distributions), the subdirectories above may be distributed across the system instead. For example, on RedHat Linux, the subdirectories are remapped, as shown in Table 1-1.
Default Installation Path |
Alternative Path (RedHat Linux) |
---|---|
/usr/local/apache/cgi-bin |
/home/httpd/cgi-bin |
/usr/local/apache/htdocs |
/home/httpd/html |
/usr/local/apache/conf |
/etc/httpd/conf |
/usr/local/apache/logs |
/var/log/httpd |
If this is the case, you will need to translate our instructions to the paths on your system. If Apache is installed on your system, and its directories are not at either of these locations, then ask your system administrator or refer to your system documentation to locate them.
You configure Apache by modifying the configuration files found in the conf directory. These files contain directives that Apache reads when it starts. Older versions of Apache included three files: httpd.conf, srm.conf, and access.conf. However, using the latter two files was never required, and recent distributions of Apache include all of the directives in httpd.conf. This allows you to manage the full configuration in one location without bouncing between files. It also avoids situations where your configuration between files does not match, which can create security problems.
Many sites still use all three configuration files, if only because they have not bothered to combine them. Therefore, here and throughout the book, whenever we discuss Apache configuration, we will specify the alternative name of the file you need to edit if you are using all three files.
Finally, remember that Apache must be told to reread its configuration files whenever you make changes to them. You do not need to do a full server restart, although that also works. If your system has the apachectl command (part of the standard install), you can tell Apache to reread its configuration while it is running with this command:
$ apachectl graceful
This may require superuser (i.e., root) privileges.
Enabling CGI execution with Apache is very simple, although there is a good way to do it and a less good way to do it. Let's start with the good way, which involves creating a special directory for our CGI scripts.
The ScriptAlias directive tells the web server to map a virtual path (the path in a URL) to a directory on the disk and execute any files it finds there as CGI scripts.
To enable CGI scripts for our web server, place this directive in httpd.conf :
ScriptAlias /cgi /usr/local/apache/cgi-bin
For example, if a user accesses the URL:
http://your_host.com/cgi/my_script.cgi
then the local program:
/usr/local/apache/cgi-bin/my_script.cgi
will be executed by the server. Note that the cgi path in the URL does not need to be the same as the name of the filesystem directory, cgi-bin . Whether you map the CGI directory to the virtual path called cgi, cgi-bin, or anything else for that matter, is strictly your own preference. You can also have multiple directories hold CGI scripts if you need that feature:
ScriptAlias /cgi /usr/local/apache/cgi-bin/ ScriptAlias /cgi2 /usr/local/apache/alt-cgi-bin/
The directory that holds CGI scripts must be outside the server's document root. In a standard Apache install, the document root maps to the htdocs directory. All files beneath this directory are browsable. By default, the cgi-bin directory is not beneath htdocs, so if we were to disable our ScriptAlias directive, for example, there would be no way to access the CGI scripts. There is a very good reason for this, and it is not simply to protect yourself from someone accidentally deleting the ScriptAlias directive.
Here is an example why you should not place your CGI script directory within the document root. Say you do decide that you want to have multiple directories for CGI scripts throughout your web site within the document root. You might decide that it would be nice to have a directory for each of your major applications. Say that you have an online widget store that you put in /usr/local/apache/htdocs/widgets and the CGI script directory at /usr/local/apache/htdocs/widgets/cgi. You then add the following directive:
ScriptAlias /widgets-cgi /usr/local/apache/htdocs/widgets/cgi
If you were to do this and test it, it would work fine. However, suppose that your company later expands to sell woozles in addition to widgets, so the store needs a more general name. You rename the widgets directory to store, update the ScriptAlias directive, update all related HTML links, and create a symbolic link from widgets to store in order to support those users who bookmarked the old name. Sounds like a good plan, right?
Unfortunately, that last step, the symbolic link, just created a large security hole. The problem is that it is now possible to access your CGI scripts via two different URLs. For example, you may have a CGI script called purchase.cgi that can be accessed either of these two ways:
http://localhost/store-cgi/purchase.cgi
http://localhost/widgets-cgi/purchase.cgi
The first URL will be handled by the ScriptAlias directive; the second will not. If users attempt to access the second URL, instead of being greeted by a web page, they will be greeted with the source code of your CGI script. If you're lucky, someone will send you an email notifying you of the problem. If you're not, a mischievous user may start poking around your scripts to find security holes to break into your system to get at more valuable information (like database passwords or credit card numbers).
Any symbolic link above a directory containing CGI scripts allows this security hole.[1] The scenario about renaming a directory and providing a link to its old name is simply one example of a situation when this may occur innocently. If you place your CGI scripts outside of your server's document root, you never have to worry about someone accidentally exposing your scripts this way.
[1]It is possible to configure Apache to not follow symbolic links, which provides an alternative solution. However, symbolic links in general can be quite useful, and they are enabled by default. The problem in this situation is not with the symbolic link; it is with having the CGI scripts in a browsable location.
You may wonder why revealing your source code is such a problem. CGI scripts have certain characteristics that make them quite different than other forms of executables from a security standpoint. They allow remote, anonymous users to run programs on your system. Thus, security should always be an important consideration, and your code must be flawless if you are willing to allow potential attackers to review your source code. Although security through obscurity is not good protection in and of itself, it certainly doesn't hurt when combined with other forms of security. We will discuss security in much greater detail in Chapter 8, "Security".
The alternative to configuring CGI scripts via a common directory is to distribute them throughout your document tree and have your web server recognize them by their filename extension, such as .cgi. This is a very bad idea, from the standpoint of both architecture and security.
From an architectural standpoint, you should not do this because having a common directory for all of your CGI scripts helps you manage them. As web sites grow, it may be difficult to keep track of all of the CGI scripts that your site uses. Placing them under a common directory makes them easier to find and promotes creating CGI scripts that are general solutions to multiple problems instead of handfuls of single-use scripts. You can then create subdirectories beneath the main /cgi directory to organize your scripts.
There are two reasons why configuring CGI scripts by extension is insecure. First, it allows anyone who has permissions to update HTML files to create CGI scripts. As we said, CGI scripts require particular security considerations, and you should not allow novice programmers to create scripts on production web servers. Second, it increases the likelihood that someone can view the source code to your CGI scripts. Many text editors create backup files while you are editing a file; some of them create these files in the same directory where you are working. For example, if you were editing a file called top_secret.cgi with emacs, it typically creates a backup file called top_secret.cgi~. If this second file makes it onto the production web server and someone with a lucky hunch attempts to request that file, the web server will not recognize the extension and will simply return the raw source code.
Of course, your text editor ideally should delete these files when you finish working on them, and you really should not be editing files directly on a production web server. But files like this do get left around sometimes, and they might make it to the production web server. Files also get renamed manually sometimes. A developer may wish to make changes to a file but save a backup of this file by making a copy and renaming it with a .bak extension. If a backup file were in a directory configured with ScriptAlias, then it is not displayed; it is treated like any other CGI script and executed, which is a much safer alternative.
So, if your web server happens to be configured to allow CGI scripts anywhere, here is how to fix it. The following line tells the web server to execute any file ending with a .cgi suffix:
AddHandler cgi-script .cgi
You can comment it out by preceding it with #, just like in Perl. Without this directive, Apache will treat .cgi files as unknown files and return them according to the default media type -- typically plain text. So be sure that you move all of your CGI scripts outside the document root before you remove this directive.
You may also turn off the CGI execute permissions for particular directories by disabling the ExecCGI option. The line to enable it looks like this:
<Directory "/usr/local/apache/htdocs"> . . Options Indexes FollowSymLinks ExecCGI . . </Directory>
There are probably many other lines above and below the Options directive, and the Options directive on your system may differ. If you remove ExecCGI, then even with the CGI handler directive enabled above, Apache will not execute CGI scripts in the location that this Options directive applies -- in this case, the document root, /usr/local/apache/htdocs. Users will instead get an error page telling them "Permission Denied."
Now that we have our web server set up, and we have gotten a chance to see what CGI can do, we can investigate CGI in more detail. We start the next chapter by reviewing HTTP, the language of the Web and the foundation of CGI.
Copyright © 2001 O'Reilly & Associates. All rights reserved.