gnuplot is a software application suited for graphing simple numerical information. It has the ability to take raw data and create various types of graphs, including point and line graphs and histograms. Let's take a look at an example that illustrates the ease with which we can produce graphs, especially when compared to PostScript and the gd graphics library.
You can get gnuplot from ftp://prep.ai.mit.edu/pub/gnu/gnuplot-3.5.tar.gz.
The following example plots the number of Web server accesses for every hour as a histogram. The program parses through the server log file, keeping track of the accesses for each hour of the day in an array. The information stored in this array is written to a file in a format that gnuplot can understand. We then call gnuplot to graph the data in the file and output the resulting graphic to a file.
#!/usr/local/bin/perl $webmaster = "shishir\@bu\.edu"; $gnuplot = "/usr/local/bin/gnuplot"; $ppmtogif = "/usr/local/bin/pbmplus/ppmtogif"; $access_log = "/usr/local/bin/httpd_1.4.2/logs/access_log";
The gnuplot utility, as of version v3.5, cannot produce GIF images, but can output PBM (portable bitmap) format files. We'll use the ppmtogif utility to convert the output image from PBM to GIF. The $access_log variable points to the NCSA server log file, which we'll parse.
$process_id = $$; $output_ppm = join ("", "/tmp/", $process_id, ".ppm"); $datafile = join ("", "/tmp/", $process_id, ".txt");
These variables are used to store the temporary files. The $$ variable refers to the number of the process running this program, as it does in a shell script. I don't care what process is running my program, but I can use the number to create a filename that I know will be unique, even if multiple instances of my program run. (Use of the process number for this purpose is a trick that shell programmers have used for decades.) The process identification is prefixed to each filename.
$x = 0.6; $y = 0.6; $color = 1;
The size of the plot is defined to be 60% of the original image in both the x and y directions. All lines in the graph will be red (indicated by a value of 1).
if ( open (FILE, "<" . $access_log) ) { for ($loop=0; $loop < 24; $loop++) { $time[$loop] = 0; }
We open the NCSA server access log for input. The format of each entry in the log is:
host rfc931 authuser [DD/Mon/YY:hh:mm:ss] "request" status_code bytes
where:
A 24-element array called @time is initialized. This array will contain the number of accesses for each hour.
while (<FILE>) { if (m|\[\d+/\w+/\d+:([^:]+)|) { $time[$1]++; } } close (FILE);
In case you didn't believe me when I said in Chapter 6, Hypermedia Documents that Perl offered superb facilities for CGI programming, this tiny loop contains some proof of what I'm talking about. The regular expression (containing some enhancements that only Perl offers) neatly picks the hour out of the date/time string in the access log by searching for the pattern "[DD/Mon/YY:h:", as follows:
Back to the program. If a line matches the pattern, the array element corresponding to the particular hour is incremented.
&create_output_file();
The subroutine create_output_file is called to create and display the plot.
} else { &return_error (500, "Server Log File Error", "Cannot open NCSA server access log!"); } exit(0);
If the log file can't be opened, the return_error subroutine is called to output an error.
The create_output_file subroutine is now defined. It creates a data file consisting of the information in the @time array.
sub create_output_file { local ($loop); if ( (open (FILE, ">" . $datafile)) ) { for ($loop=0; $loop < 24; $loop++) { print FILE $loop, " ", $time[$loop], "\n"; } close (FILE); &send_data_to_gnuplot(); } else { &return_error (500, "Server Log File Error", "Cannot write to data file!"); } }
The file specified by the variable $datafile is opened for output. The hour and the number of accesses for that hour are written to the file. The hour represents the x coordinate, while the number of accesses represents the y coordinate. The subroutine send_data_to_gnuplot is called to execute gnuplot.
sub send_data_to_gnuplot { open (GNUPLOT, "|$gnuplot"); print GNUPLOT <<gnuplot_Commands_Done;
We're going to use the same technique we've used throughout the chapter to embed a "language" within a Perl script: We'll open a pipe to a program and write out commands in the language recognized by the program. The open command starts gnuplot, and the print command sends the data to gnuplot through the pipe.
set term pbm color small set output "$output_ppm" set size $x, $y set title "WWW Server Usage" set xlabel "Time (Hours)" set ylabel "No. of Requests" set xrange [-1:24] set xtics 0, 2, 23 set noxzeroaxis set noyzeroaxis set border set nogrid set nokey plot "$datafile" w boxes $color gnuplot_Commands_Done close (GNUPLOT);
Let's take a closer look at the commands that we send to gnuplot through the pipe. The set term command sets the format for the output file. In this case, the format is a color PBM file with a small font for titles. You can even instruct gnuplot to produce text graphs by setting the term to "dumb."
The output file is set to the filename stored in the variable $output_ppm. The size of the image is set using the size command. The title of the graph and the labels for the x and y axes are specified with the title, xlabel, and ylabel commands, respectively. The range on the x axis is -1 to 24. Even though we are dealing with data from 0 to 23 hours, the range is increased because gnuplot graphs data near the axes abnormally. The tick marks on the x axis range from 0 to 23 in increments of two. The line representing the y axis is removed by the noyzeroaxis command, which makes the graph appear neater. The same is true for the noxzeroaxis command.
The graph is drawn with a border, but without a grid or a legend. Finally, the plot command graphs the data in the file specified by the $datafile variable with red boxes. Several different types of graphs are possible; instead of boxes, you can try "lines" or "points."
&print_gif_file_and_cleanup(); }
The print_gif_file_and_cleanup subroutine displays this image, and removes the temporary files.
sub print_gif_file_and_cleanup { $| = 1; print "Content-type: image/gif", "\n\n"; system ("$ppmtogif $output_ppm 2> /dev/null"); unlink $output_ppm, $datafile; }
The system command executes the ppmtogif utility to convert the PBM image to GIF. This utility writes the output directly to standard output.
You might wonder what the 2> signifies. Like most utilities, ppmtogif prints some diagnostic information to standard error when transforming the image. The 2> redirects standard error to the null device (/dev/null), basically throwing it away.
Finally, we use the unlink command to remove the temporary files that we've created.
The image produced by this program is shown in Figure 6.5.