16.7 Tips on Generating Random NumbersRandom numbers play an important role in modern computer security. Many programs that use encryption need a good source of random numbers for producing session keys. For example, the PGP program uses random numbers for generating a random key that is used to encrypt the contents of electronic mail messages; the random key is then itself encrypted using the recipient's public key. Random numbers have other uses in computer security as well. A variety of authentication protocols require that the computer create a random number, encrypt it, and send it to the user. The user must then decrypt the number, perform a mathematical operation on it, re-encrypt the number, and send it back to the computer. A great deal is known about random numbers. Here are some general rules of thumb:
For security-related purposes, a further requirement for random numbers is unpredictability:
One of the best ways of generating a stream of random numbers is to make use of a random process, such as radioactive decay. Unfortunately, most Unix computers are not equipped with Geiger counters. Thus, they need to use something else. Often, they use pseudorandom functions as random number generators. A pseudorandom function is a function that yields a series of outputs that appears to be unpredictable. In practice, these functions maintain a large internal state from which the output is calculated. Each time a new number is generated, the internal state is changed. The function's initial state is referred to as its seed . If you need a series of random numbers that is repeatable, you need a pseudorandom generator that takes a seed and keeps an internal state. If you need a nonreproducible series of random numbers, you should avoid pseudorandom generators. Thus, successfully picking random numbers in the Unix environment depends on two things: picking the right random number generator, and then picking a different seed each time the program is run. 16.7.1 Unix Pseudorandom FunctionsThe standard Unix C library provides two random number generators: rand( ) and random( ). A third random number generator, drand48( ), is available on some versions of Unix. Although you won't want to use any of these routines to produce cryptographic random numbers, we'll briefly explain each. Then, if you need to use one of them for something else, you'll know something about its strengths and shortcomings. 16.7.1.1 rand( )The original Unix random number generator, rand( ), is not a very good random number generator. It uses a 32-bit seed and maintains a 32bit internal state. The output of the function is also 32 bits in length, making it a simple matter to determine the function's internal state by examining the output. As a result, rand( ) is not very random. Furthermore, the low-order bits of some implementations are not random at all, but flip back and forth between 0 and 1 according to a regular pattern. The rand( ) random number generator is seeded with the function srand( ). On some versions of Unix, a third function is provided, rand_r( ), for multithreaded applications. (The function rand( ) itself is not safe for multithreading, as it maintains internal state.) Do not use rand( ), even for "trivial" programs. 16.7.1.2 random( )The random( ) function is a more sophisticated random number generator that uses nonlinear feedback and an internal table that is 124 bytes (992 bits) long. The function returns random values that are 32 bits in length. All of the bits generated by random( ) are usable. The random( ) function is adequate for simulations and games, but should not be used for security-related applications such as picking cryptographic keys or simulating one-time pads. 16.7.1.3 drand48( ), lrand48( ), and mrand48( )The drand48( ) function is one of many functions that make up the System V random number generator. According to the Solaris documentation, the algorithm uses "the well-known linear congruential algorithm and 48-bit integer arithmetic." The function drand48( ) returns a double-precision number that is greater than or equal to 0.0 and less than 1.0, while the lrand48( ) and mrand48( ) functions return random numbers within a specified integer range. As with random( ), these functions provide excellent random numbers for simulations and games, but should not be used for security-related applications such as picking cryptographic keys or simulating one-time pads; linear congruential algorithms are too easy to break. 16.7.2 Picking a Random SeedUsing a good random number generator is easy. Picking a random seed, on the other hand, can be quite difficult. Conceptually, picking a random number should be easy: pick something that is always different.[21] But in practice, picking a random number—especially one that will be used as the basis of a cryptographic key—is quite difficult. The practice is difficult because many things that change all the time actually change in predictable ways.
A stunning example of a poorly chosen seed for a random number generator was revealed on the front page of the New York Times[22] in September 1995. The problem was in Netscape Navigator, a popular program for browsing the World Wide Web. Instead of using truly random information for seeding the random number generator, Netscape's programmers used a combination of the current time of day, the process ID (PID) of the running Netscape program, and the parent process ID (PPID). Researchers at the University of California at Berkeley discovered that they could, through a process of trial and error, discover the numbers that any copy of Netscape was using and crack the encrypted messages with relative ease.
Another example of a badly chosen seed generation routine was used in Kerberos Version 4. This routine was based on the time of day XORed with other information. The XOR effectively masked out the other information and resulted in a seed of only 20 bits of unpredictable value. This reduced the key space from more than 72 quadrillion possible keys to slightly more than 1 million, thus allowing keys to be guessed in a matter of seconds. When this weakness was discovered at Purdue's COAST Laboratory,[23] conversations with personnel at MIT revealed that they had known for years that this problem existed, but the patch had somehow never been released.
In the book Network Security, Private Communication in a Public World, Kaufman et al. identify three typical mistakes when picking random-number seeds:
How do you pick a good random send? Here are some ideas:
RFC 1705, by Donald Eastlake, Steve Crocker, and Jeffrey Schiller, makes many observations about picking seeds for random number generators. Among them are the following:
16.7.3 A Good Random Seed GeneratorAs we've mentioned, one way of generating a random seed is to use a source message digest algorithm such as MD5 or SHA-1. As input, give it as much data as you can based on temporary state. This data might include the output of ps -efl, the environment variables for the current process, its PID and PPID, the current time and date, the output of the random number generator given your seed, the seed itself, the state of network connections, and perhaps a directory listing of the current directory. The output of the function will be a string of bits that an attacker cannot likely duplicate, but that is likely to meet all the other conditions of randomness you might desire. The Perl program in Example 16-1 is an example of such a program. It uses several aspects of system state, network status, virtual memory statistics, and process state as input to MD5. These numbers change very quickly on most computers and cannot be anticipated, even by programs running as superuser on the same computer. The entropy (randomness) of these values is spread throughout the result by the hashing function of MD5, resulting in an output that should be sufficiently random for most uses. Example 16-1. Generating a random seed string#!/usr/bin/perl -w # # randbits -- Gene Spafford <spaf@purdue.edu> # Generate a random seed string based on state of system. # # Inspired by a program from Bennett Todd, derived # from original by Larry Wall # # Uses state of various kernel structures as random "seed" # Mashes them together and uses MD5 to spread around # # Usage: randbits [-n] [-h | -H ] [keylen] # In which # -n means to emit no trailing linefeed # -h means to give output in hex (default) # -H means hex output, but use uppercase letters # -s means to give output in base 64 # keylen is the number of bytes to the random key (default is 8) # If you run this on a different kind of system, you should adjust the # setting in the "noise" string to system-specific strings. Do it as another # case in the "if...else" and email me the modification so I can keep a # merged copy. (Hint: check in your manual for any programs with "stat" in # the name or description.) # # You will need to install the Digest::MD5 module from CPAN if it is not already present. use Digest::MD5; use Getopt::Std; # Augment the path to something that should contain all needed commands. $ENV{'PATH'} .= "/bin:/usr/bin:/usr/etc:/usr/ucb:/etc:"; # We start with the observation that most machines have either a BSD-ish # core command set, or a System V-ish command set. We'll build from those. $BSD = "ps -agxlww ; netstat -s ; vmstat -s ;"; $SYSV = "ps -eflj ; netstat -s ; nfsstat -nr ;"; if ( -e "/sdmach" ) { $_ = "NeXT"; } elsif ( -x "/usr/bin/uname" || -x "/bin/uname") { $_ = `uname -sr`; } elsif ( -x "/etc/version" ) { $_ = `/etc/version`; } else { die "How do I tell what OS this is?"; } /^AIX / && ( $noise = $BSD . 'pstat -fs')|| /^CYGWIN/ && ( $noise = "ps -alW ; netstat -a ; netstat -s")|| /^Darwin/ && ( $noise = "ps -agxlww ; netstat -s ; pstat -fsvt")|| /^FreeBSD/ && ( $noise = $BSD . 'vmstat -i')|| /^HP-UX 7/ && ( $noise = $SYSV)|| /^HP-UX A.09/ && ( $noise = $SYSV . "vmstat -s")|| /^IRIX(64)? [56]/ && ( $noise = $SYSV)|| /^Linux/ && ( $noise = "ps -agxlww ; netstat -i ; netstat -s; vmstat")|| /^NeXT/ && ( $noise = 'ps agxlww; netstat -s; vm_stat')|| /^OSF1/ && ( $noise = $SYSV . 'vmstat -i')|| /^SunOS 4/ && ( $noise = $BSD . 'pstat -afipSsT;vmstat -i')|| /^SunOS 5/ && ( $noise = $SYSV . 'vmstat -i;vmstat -s; nfsstat')|| /^ULTRIX 4/ && ( $noise = $BSD . 'vmstat -s')|| die "No 'noise' commands defined for this OS. Edit and retry!"; #### End of things you may need to modify ($prog = $0) =~ s|.*/||; $usage = "usage: $prog [-n] [-h | -H | -s] [keylength]\n"; getopt('nhHs', \%opts) || die $usage; defined($keylen = shift) || ($keylen = 8); die $usage if ($keylen =~ /\D/); die $usage if (($opts{s} and $opts{h} || $opts{H}) or ($opts{h} && $opts{H})); die "Maximum keylength is 16 bytes (32 hex digits)\n" if ($keylen > 16); # Run the noise command and include whatever other state we # can conveniently (portably) find. $hash = Digest::MD5->new; $noise .= ";ls -lai . /tmp"; -d "/dev" and $noise .= " /dev"; open(NOISE, "$noise |") || die("Couldn't run noise commands: $!"); $hash->add(<NOISE>); close(NOISE); $hash->add(times(), $$, getppid(), time, join('+', %ENV)); # Format the output and finish. $buf = $opts{s} ? $hash->b64digest : $hash->hexdigest; ($buf =~ y/a-f/A-F/) if $opts{H}; print substr($buf, 0, 2*$keylen); print "\n" unless $opts{n}; Note that the techniques used in this script are similar to the approaches used by some Unix systems to implement /dev/random; they are also similar to the techniques used by EGD. As these functions are not present on all systems, we have decided to include this script here. (It is also educational to see how such a script is written.) This script is also an excellent method for generating Xauthority keys (see Section 12.3.22.2) if you need them. Simply execute it with an argument of 14 (you need 28 hex characters of key) and use the result as your key. |