CGI Programming on the World Wide Web

Previous Chapter 10 Next
 

10.9 Maintaining State with a Server

In Chapter 8, Multiple Form Interaction, we looked at several techniques for keeping track of information between multiple forms. They involved using temporary files, hidden variables, and Netscape Persistent Cookies. Now, we will look at yet another method to keep state. This involves communicating with a server-The Cookie Server-to store and retrieve information.

It will help you understand how cookies work if you see real programs use them. So we will examine a CGI program that displays two forms, and that stores the information returned by calling the cookie server. Here is the first form:

<HTML>
<HEAD><TITLE>College/School Survey</TITLE></HEAD>
<BODY>
<H1>Interests</H1>
<HR>
<FORM ACTION="/cgi-bin/cookie_client.pl?next=/location.html" METHOD="POST">

The ACTION attribute specifies the next form in the series as a query string. The filename is relative to the document root directory.

<INPUT TYPE="hidden" NAME="Magic_Cookie" VALUE="-*Cookie*-">

The string "-*Cookie*-" will be replaced by a random cookie identifier when this form is parsed by the CGI program. This cookie is used to uniquely identify the form information.

What subject are you interested in? <BR>
<INPUT TYPE="text" NAME="subject" SIZE=40>
<P>
What extra-curricular activity do you enjoy the most? <BR>
<INPUT TYPE="text" NAME="interest" SIZE=40>
<P>
<INPUT TYPE="submit" VALUE="See Next Form!">
<INPUT TYPE="reset"  VALUE="Clear the form">
</FORM>
<HR>
</BODY>
</HTML>

Here is the second form in the series. It should be stored in a file named location.html because that name was specified in the ACTION attribute of the first form.

<HTML>
<HEAD><TITLE>College/School Survey</TITLE></HEAD>
<BODY>
<H1>Location</H1>
<HR>
<FORM ACTION="/cgi-bin/cookie_client.pl" METHOD="POST">

Since this is the last form in the series, no query information is passed to the program.

<INPUT TYPE="hidden" NAME="Magic_Cookie" VALUE="-*Cookie*-">
Where would you like to go to school? <BR>
<INPUT TYPE="text" NAME="city" SIZE=40>
<P>
What type of college do you prefer? <BR>
<INPUT TYPE="text" NAME="type" SIZE=40>
<P>
<INPUT TYPE="submit" VALUE="Get Summary!">
<INPUT TYPE="reset"  VALUE="Clear the form">
</FORM>
<HR>
</BODY>
</HTML>

We will do something unusual in this example by not looking at the program that handles these programs right away. Instead, we will examine the cookie server-the continuously running program that maintains state for CGI programs. Then, we will return to the program that parses the forms-the cookie client-and see how it interacts with the server.

Cookie Server

Here I will show a general purpose server for CGI programs running on the local systems. Each CGI program is a cookie client. When it connects, this server enters a long loop accepting commands, as we will see in a moment. Please note that this is not a CGI script. Instead, it provides a data storage service for CGI scripts.

#!/usr/local/bin/perl
require "sockets.pl";
srand (time|$$);

The srand function sets the random number seed. A logical OR of the current time and the process identification number (PID) creates a very good seed.

$HTTP_server = "128.197.27.7";

The IP address of the HTTP server from where the CGI scripts will connect to this server is specified. This is used to prevent CGI programs running on other HTTP servers on the Web to communicate with this server.

$separator = "\034";
$expire_time = 15 * 60;

The expire_time variable sets the time (in seconds) for which a cookie is valid. In this case, a cookie is valid for 15 minutes.

%DATA = ();
$max_cookies = 10;
$no_cookies = 0;

The DATA associative array is used to hold the form information. The max_cookies variable sets the limit for the number of cookies that can be active at one time. And the no_cookies variable is a counter that keeps track of the number of active cookies.

$error = 500;
$success = 200;

These two variables hold the status codes for error and success, respectively.

$port = 5000;
&listen_to_port (SOCKET, $port) || die "Cannot create socket.", "\n";

The listen_to_port function is part of the socket library. It "listens" on the specified port for possible connections. In this case, port number 5000 is used. However, if you do not know what port to set the server on, you can ask the socket library to do it for you:

( ($port) = &listen_to_port (SOCKET) ) || die "Cannot create socket.", "\n";
print "The Cookie Server is running on port number: $port", "\n";

If the listen_to_port function is called in this manner (with one argument), an empty port is selected. You will then have to modify the cookie client (see the next section) to reflect the correct port number. Or, you can ask your system administrator to create an entry in the /etc/services file for the cookie server, after which the client can simply use the name "cookie" to refer to the server.

while (1) {
    ( ($ip_name, $ip_address) = &accept_connection (COOKIE, SOCKET) )
        || die "Could not accept connection.", "\n";

This starts an infinite loop that continually accepts connections. When a connection is established, a new socket handle, COOKIE, is created to deal with it, while the original file handle, SOCKET, goes back to accept more connections. The accept_connection subroutine returns the IP name and address of the remote host. In our case, this will always point to the address of the HTTP server, because the CGI program (or the client) is being executed from that server.

This cookie server, as implemented, can only "talk" to one connection at a time. All other connections are queued up, and handled in the order in which they are received. (Later on, we'll discuss how to implement a server that can handle multiple connections simultaneously.)

    select (COOKIE);
    $cookie = undef;

The default output file handle is set to COOKIE. The cookie variable is used to hold the current cookie identifier.

    if ($ip_address ne $HTTP_server) {
        &print_status ($error, "You are not allowed to connect to server.");

If the IP address of the remote host does not match the address of the HTTP server, the connection is coming from a host somewhere else. We do not want servers running on other hosts connecting to this server and storing information, which could result in a massive system overload! However, you can set this up so that all machines within your domain can access this server to store information.

    } else {
        &print_status ($success, "Welcome from $ip_name ($ip_address)");

A welcome message is displayed if the connection is coming from the right place (our HTTP server). The print_status subroutine simply outputs the status number and the message to standard output.

        while (<COOKIE>) {
            s/[\000-\037]//g;
            s/^\s*(.*)\b\s*/$1/;

The while loop accepts input from the socket continuously. All control characters, as well as leading and trailing spaces, are removed from the input. This server accepts the following commands:

new remote-address
cookie cookie-identifier remote-address
key = value
list
delete

We will discuss each of these in a moment.

            if ( ($remote_address) = /^new\s*(\S+)$/) {

The new command creates a new and unique cookie and outputs it to the socket. The remote address of the host that is connected to the HTTP server should be passed as an argument to this command. This makes it difficult for intruders to break the server, as you will see in a minute. Here is an example of how this command is used, and its typical output (with the client's command in bold):

new www.test.net
200: 13fGK7KIlZSF2

The status along with a unique cookie identifier is output. The client should parse this line, get the cookie, and insert it in the form, either as a query or a hidden variable.

                if ($cookie) {
                    &print_status ($error, 
									"You already have a cookie!");

If the cookie variable is defined, an error message is displayed. This would only occur if you try to call the new command multiple times in the same session.

                } else {
                    if ($no_cookies >= $max_cookies) {
                        &print_status ($error, 
									    "Cookie limit reached.");
                    } else {
                        do {
                            $cookie = &generate_new_cookie 
						            ($remote_address);
                        } until (!$DATA{$cookie});

If a cookie is not defined for this session, and the number of cookies is not over the pre-defined limit, the generate_new_cookie subroutine is called to create a unique cookie.

                        $no_cookies++;
                        $DATA{$cookie} = join("::", $remote_address,
                                                    $cookie, time);
                        &print_status ($success, $cookie);
                    }
                }    

Once a cookie is successfully created, the counter is incremented, and a new key is inserted into the DATA associative array. The value for this key is a string containing the remote address (so we can check against it later), the cookie, and the time (for expiration purposes).

            } elsif ( ($check_cookie, $remote_address) = 
                /^cookie\s*(\S+)\s*(\S+)/) {

The cookie command sets the cookie for the session. Once you set a cookie, you can store information, list the stored information, and delete the cookie. The cookie command is generally used once you have a valid cookie (by using the new command). Here is a typical cookie command:

cookie 13fGK7KIlZSF2 www.test.net
200: Cookie 13fGK7KIlZSF2 set.

The server will return a status indicating either success or failure. If you try to set a cookie that does not exist, you will get the following error message:

cookie 6bseVEbhf74 www.test.net
500: Cookie does not exist.

And if the IP address is not the same as the one that was used when creating the cookie, this is what is displayed:

cookie 13fGK7KIlZSF2 www.joe.net
500: Incorrect IP address.

The program continues:

                if ($cookie) {
                    &print_status ($error, "You already specified a cookie.");

If the cookie command is specified multiple times in a session, an error message is output.

            } else {
                if ($DATA{$check_cookie}) {
                    ($old_address) = split(/::/, $DATA{$check_cookie});
                             
                    if ($old_address ne $remote_address) {
                        &print_status ($error, "Incorrect IP address.");
                    } else {
                        $cookie = $check_cookie;
                        &print_status ($success, "Cookie $cookie set.");
                    }
                } else {
                    &print_status ($error, "Cookie does not exist.");
                 }
            }

If the cookie exists, the specified address is compared to the original IP address. If everything is valid, the cookie variable will contain the cookie.

            } elsif ( ($variable, $value) = /^(\w+)\s*=\s*(.*)$/) {

The regular expression checks for a statement that contains a key and a value that is used to store the information.

[Graphic: Figure from the text]

Here is a sample session where two variables are stored:

cookie 13fGK7KIlZSF2 www.test.net
200: Cookie 13fGK7KIlZSF2 set.
name = Joe Test
200: name=Joe Test
organization = Test Net
200: organization=Test Net

The server is stringent, and allows only variables composed of alphanumeric characters (A-Z, a-z, 0-9, _).

                if ($cookie) {
                    $key = join ($separator, $cookie, $variable);
                    $DATA{$key} = $value;
                    &print_status ($success, "$variable=$value");
                } else {
                    &print_status ($error, "You must specify a cookie.");
                }

The variable name is concatenated with the cookie and the separator to create the key for the associative array.

            } elsif (/^list$/) {
                if ($cookie) {
                    foreach $key (keys %DATA) {
                        $string = join ("", $cookie, $separator);
                        if ( ($variable) = $key =~ /^$string(.*)$/) {
                            &print_status ($success, "$variable=$DATA{$key}");
                        }
                    }
                    print ".", "\n";
                } else {
                    &print_status ($error, "You don't have a cookie yet.");
                }

The list command displays all of the stored information by iterating through the DATA associative array. Only keys that contain the separator are output. In other words, the initial key containing the cookie, the remote address, and the time is not displayed. Here is the output from a list command:

cookie 13fGK7KIlZSF2 www.test.net
200: Cookie 13fGK7KIlZSF2 set.
list
200: name=Joe Test
200: organization=Test Net
.

The data ends with the "." character, so that the client can stop reading at that point and an infinite loop is not created.

            } elsif (/^delete$/) {
                if ($cookie) {
                    &remove_cookie ($cookie);
                    &print_status ($success, "Cookie $cookie deleted.");
                } else {
                    &print_status ($error, "Select a cookie to delete.");
                }

The delete command removes the cookie from its internal database. The remove_cookie subroutine is called to remove all information associated with the cookie. Here is an example that shows the effect of the delete command:

cookie 13fGK7KIlZSF2 www.test.net
200: Cookie 13fGK7KIlZSF2 set.
list
200: name=Joe Test
200: organization=Test Net
.
delete
200: Cookie 13fGK7KIlZSF2 deleted.
list
.

The program continues:

            } elsif (/^exit|quit$/) {
                $cookie = undef;
                &print_status ($success, "Bye.");
                last;

The exit and quit commands are used to exit from the server. The cookie variable is cleared. This is very important! If it is not cleared, the server will incorrectly assume that a cookie is already set when a new connection is established. This can be dangerous, as the new session can see the variables stored by the previous connection by executing the list command.

            } elsif (!/^\s*$/) {
                &print_status ($error, "Invalid command.");
            }
        }
    }

An error message is output if the specified command is not among the ones listed.

    &close_connection (COOKIE);
    &expire_old_cookies();
}
exit(0);

The connection between the server and the client is closed. The expire_old_cookies subroutine removes any cookies (and the information associated with them) that have expired. In reality, the cookies are not necessarily expired after the predefined amount of time, but are checked (and removed) when a connection terminates.

The print_status subroutine simply displays a status and the message.

sub print_status
{
    local ($status, $message) = @_;
    print $status, ": ", $message, "\n";
}

The generate_new_cookie subroutine generates a random and unique cookie by using the crypt function to encrypt a string that is based on the current time and the remote address. The algorithm used in creating a cookie is arbitrary; you can use just about any algorithm to generate random cookies.

sub generate_new_cookie
{
    local ($remote) = @_;
    local ($random, $temp_address, $cookie_string, $new_cookie);
    $random = rand (time);
    ($temp_address = $remote) =~ s/\.//g;
    $cookie_string = join ("", $temp_address, time) / $random;
    $new_cookie = crypt ($cookie_string, $random);
    return ($new_cookie);
}

The expire_old_cookies subroutine removes cookies after a pre-defined period of time. The foreach loop iterates through the associative array, searching for keys that do not contain the separator (i.e., the original key). For each original key, the sum of the creation time and the expiration time (in seconds) is compared with the current time. If the cookie has expired, the remove_cookie subroutine is called to delete the cookie.

sub expire_old_cookies
{
    local ($current_time, $key, $cookie_time);
    $current_time = time;
    foreach $key (keys %DATA) {
        if ($key !~ /$separator/) {
            $cookie_time = (split(/::/, $DATA{$key}))[2];
            if ( $current_time >= ($cookie_time + $expire_time) ) {
               &remove_cookie ($key);
            }
        }
    }
}

The remove_cookie subroutine deletes the cookie:

sub remove_cookie
{
    local ($cookie_key) = @_;
    local ($key, $exact_cookie);
    $exact_cookie = (split(/::/, $DATA{$cookie_key}))[1];
    
    foreach $key (keys %DATA) {
        if ($key =~ /$exact_cookie/) {
            delete $DATA{$key};
        }
    }
    $no_cookies--;
}

The loop iterates through the array, searches for all keys that contain the cookie identifier, and deletes them. The counter is decremented when a cookie is removed.

Now, let's look at the CGI program that communicates with this server to keep state.

Cookie Client

Let's review what a cookie client is, and what it needs from a server. A client is a CGI program that has to run many times for each user (usually because it displays multiple forms and is invoked each time by each form). The program needs to open a connection to the cookie server, create a cookie, and store information in it. The information stored for one form is retrieved later when the user submits another form.

#!/usr/local/bin/perl
require "sockets.pl";
$webmaster = "Shishir Gundavaram (shishir\@bu\.edu)";
$remote_address = $ENV{'REMOTE_ADDR'};

The remote address of the host that is connected to this HTTP server is stored. This information will be used to create unique cookies.

$cookie_server = "cgi.bu.edu";
$cookie_port = 5000;
$document_root = "/usr/local/bin/httpd_1.4.2/public";
$error = "Cookie Client Error";
&parse_form_data (*FORM);
$start_form = $FORM{'start'};
$next_form = $FORM{'next'};
$cookie = $FORM{'Magic_Cookie'};

Initially, the browser needs to pass a query to this program, indicating the first form:

http://some.machine/cgi-bin/cookie_client.pl?start=/interests.html

All forms after that must contain a next query in the <FORM> tag:

<FORM ACTION="/cgi-bin/cookie_client.pl?next=/location.html" METHOD="POST">

The filename passed in the name query can be different for each form. That is how the forms let the user navigate.

Finally, there must be a hidden field in each form that contains the cookie:

<INPUT TYPE="hidden" NAME="Magic_Cookie" VALUE="-*Cookie*-">

This script will replace the string "-*Cookie*-" with a unique cookie, retrieved from the cookie server. This identifier allows one form to retrieve what another form has stored.

One way to think of this cookie technique is this: The cookie server stores all the data this program wants to save. To retrieve the data, each run of the program just needs to know the cookie. One instance of the program passes this cookie to the next instance by placing it in the form. The form then sends the cookie to the new instance of the program.

if ($start_form) {
    $cookie = &get_new_cookie ();
    &parse_form ($start_form, $cookie);

If the specified form is the first one in the series, the get_new_cookie subroutine is called to retrieve a new cookie identifier. And the parse_form subroutine is responsible for placing the actual cookie in the hidden field.

} elsif ($next_form) {
    &save_current_form ($cookie);
    &parse_form ($next_form, $cookie);

Either $start_form or $next_form will be set, but the browser should not set both. There is only one start to a session! If the form contains the next query, the information within it is stored on the cookie server, which is accomplished by the save_current_form subroutine.

} else {
    if ($cookie) {
        &last_form ($cookie);
    } else {
        &return_error (500, $error,
                "You have executed this script in an invalid manner.");
    }
}
exit (0);

Finally, if the form does not contain any query information, but does contain a cookie identifier, the last_form subroutine is called to display all of the stored information.

That is the end of the main program. It simply lays out a structure. If each form contains the correct start or next query, the program will display everything when the user wants it.

The open_and_check subroutine simply connects to the cookie server and reads the first line (remove the trailing newline character) that is output by the server. It then checks this line to make sure that the server is functioning properly.

sub open_and_check
{
    local ($first_line);
    &open_connection (COOKIE, $cookie_server, $cookie_port)
        || &return_error (500, $error, "Could not connect to cookie server.");
    chop ($first_line = <COOKIE>);
    if ($first_line !~ /^200/) {
        &return_error (500, $error, "Cookie server returned an error.");
    }
}

The get_new_cookie subroutine issues the new command to the server and then checks the status to make sure that a unique cookie identifier was output by the server.

sub get_new_cookie
{
    local ($cookie_line, $new_cookie);
    &open_and_check ();
    print COOKIE "new ", $remote_address, "\n";
    chop ($cookie_line = <COOKIE>);
    &close_connection (COOKIE);
    if ( ($new_cookie) = $cookie_line =~ /^200: (\S+)$/) {
        return ($new_cookie);
    } else {
        &return_error (500, $error, "New cookie was not created.");
    }
}

The parse_form subroutine constructs and displays a dynamic form. It reads the entire contents of the form from a file, such as location.html. The only change this subroutine makes is to replace the string "-*Cookie*-" with the unique cookie returned by the cookie server. The form passes the cookie as input data to the program, and the program passes the cookie to the server to set and list data.

sub parse_form
{
    local ($form, $magic_cookie) = @_;
    local ($path_to_form);
    if ($form =~ /\.\./){
        &return_error (500, $error, "What are you trying to do?");
    }
    $path_to_form = join ("/", $document_root, $form);
    open (FILE, "<" . $path_to_form)
        || &return_error (500, $error, "Could not open form.");
    print "Content-type: text/html", "\n\n";
    while (<FILE>) {
        if (/-\*Cookie\*-/) {
            s//$magic_cookie/g;
        }
        print;
    }
    close (FILE);
}

The save_current_form subroutine stores the form information on the cookie server.

sub save_current_form
{
    local ($magic_cookie) = @_;
    local ($ignore_fields, $cookie_line, $key);
    $ignore_fields = '(start|next|Magic_Cookie)';
    &open_and_check ();
    print COOKIE "cookie $magic_cookie $remote_address", "\n";
    chop ($cookie_line = <COOKIE>);

The cookie command is issued to the server to set the cookie for subsequent add, delete, and list operations.

    if ($cookie_line =~ /^200/) {
        foreach $key (keys %FORM) {
            next if ($key =~ /\b$ignore_fields\b/o);
        
            print COOKIE $key, "=", $FORM{$key}, "\n";
            chop ($cookie_line = <COOKIE>);
            if ($cookie_line !~ /^200/) {
                &return_error (500, $error, "Form info. could not be stored.");
            }
        }
    } else {
        &return_error (500, $error, "The cookie could not be set.");
    }
    &close_connection (COOKIE);
}

The foreach loop iterates through the associative array containing the form information. All fields, with the exception of start, next, and Magic_Cookie, are stored on the cookie server. These fields are used internally by this program, and are not meant to be stored. If the server cannot store the information, it returns an error.

The last_form subroutine is executed when the last form in the series is being processed. The list command is sent to the server. The display_all_items subroutine reads and displays the server output in response to this command. Finally, the cookie is deleted.

sub last_form
{
    local ($magic_cookie) = @_;
    local ($cookie_line, $key_value, $key, $value);
    &open_and_check ();
    print COOKIE "cookie $magic_cookie $remote_address", "\n";
    chop ($cookie_line = <COOKIE>);
    if ($cookie_line =~ /^200/) {
        print COOKIE "list", "\n";
        &display_all_items ();
        print COOKIE "delete", "\n";
    } else {
        &return_error (500, $error, "The cookie could not be set.");
    }
    &close_connection (COOKIE);
}

The display_all_items subroutine prints a summary of the user's responses.

sub display_all_items
{
    local ($key_value, $key, $value);
    print "Content-type: text/html", "\n\n";
    print "<HTML>", "\n";
    print "<HEAD><TITLE>Summary</TITLE></HEAD>", "\n";
    print "<BODY>", "\n";
    print "<H1>Summary and Results</H1>", "\n";
    print "Here are the items/options that you selected:", "<HR>", "\n";
    while (<COOKIE>) {
        chop;
        last if (/^\.$/);
        $key_value = (split (/\s/, $_, 2))[1];
        ($key, $value) = split (/=/, $key_value);
        
        print "<B>", $key, " = ", $value, "</B>", "<BR>", "\n";
    }

The while loop reads the output from the server, and parses and displays the key-value pair.

    foreach $key (keys %FORM) {
        next if ($key =~ /^Magic_Cookie$/);
        print "<B>", $key, " = ", $FORM{$key}, "</B>", "<BR>", "\n";
    }
        print "</BODY></HTML", "\n";
}

The key-value pairs from this last form are also displayed, since they are not stored on the server.

Finally, the familiar parse_form_data subroutine concatenates the key-value pairs from both the query string (GET) and from standard input (POST), and stores them in an associative array.

sub parse_form_data
{
    local (*FORM_DATA) = @_;
    local ($query_string, @key_value_pairs, $key_value, $key, $value);
    read (STDIN, $query_string, $ENV{'CONTENT_LENGTH'});
    if ($ENV{'QUERY_STRING'}) {
            $query_string = join("&", $query_string, $ENV{'QUERY_STRING'});
    }
    @key_value_pairs = split (/&/, $query_string);
    foreach $key_value (@key_value_pairs) {
        ($key, $value) = split (/=/, $key_value);
        $key   =~ tr/+/ /;
        $value =~ tr/+/ /;
        $key   =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C", hex ($1))/eg;
        $value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C", hex ($1))/eg;
        if (defined($FORM_DATA{$key})) {
            $FORM_DATA{$key} = join ("\0", $FORM_DATA{$key}, $value);
        } else {
            $FORM_DATA{$key} = $value;
        }
    }
}  


Previous Home Next
Magic Cookies Book Index Forking/Spawning Child Processes