We look at three modules, FreezeThaw, Data::Dumper, and Storable, in this section. All of them serialize Perl data structures to ASCII or binary strings; only Storable actually writes them to disk. The other two modules are important because they can be used in conjunction with other persistence mechanisms such as databases and DBM files. All of them correctly account for blessed object references and self-referential data structures, but trip up when it comes to typeglobs, tied variables, or scalars containing pointers to C data types (justifiably so). It is also impossible for these (or any) modules to understand implicit relationships. For example, if you use the ObjectTemplate approach described in Section 8.1, "Efficient Attribute Storage", the "object" is basically an array index, and so the disk will get to see only a bunch of meaningless array indices minus the data. Another subtle error occurs when you use references as hash indices and Perl converts them to strings (such as SCALAR(0xe3f434)). This is not a real reference, so if you store the hash table to a file and recreate it, the implicit reference to the original structure is not valid any more.
Moral of the story: simple nests of Perl structures are handled easily; in all other cases, it is your responsibility to translate your application data into a structure containing ordinary Perl elements before sending it to disk.
FreezeThaw, written by Ilya Zakharevich, is a pure Perl module (no C extensions) and encodes complex data structures into printable ASCII strings. It does not deal directly with files and leaves it to you to send the encoded string to a normal file, a DBM file, or a database. Here's an example of the module's use:
use FreezeThaw qw(freeze thaw); # Import freeze() and thaw() # Create a complex data structure: a hash of arrays $c = { 'even' => [2, 4, 6, 8], 'odd' => [1, 3, 5, 7]}; # Create sample object $obj = bless {'foo' => 'bar'}, 'Example'; $msg = freeze($c, $obj); open (F, "> test") || die; syswrite (F, $msg, length($msg)); # can also use write() or print()
The freeze() function takes a list of scalars to be encoded and returns one string. Arrays and hashes must be passed by reference. The thaw method takes an encoded string and returns the same list of scalars:
($c, $obj) = thaw ($msg);
We will use FreezeThaw in Section 13.1, "Msg: Messaging Toolkit", to send data structures across a socket connection. Because the encoding is ASCII, we don't need to worry about machine-specific details such as byte order, or the length of integers and floating point numbers.
Data::Dumper, written by Gurusamy Sarathy, is similar in spirit to FreezeThaw, but takes a very different approach. It converts the list of scalars passed to its Dumper
function into pretty-printed Perl code, which can be stored into a file and subsequently evaled. Consider
use Data::Dumper ; # Create a complex data structure: a hash of arrays $c = { 'even' => [2, 4,], 'odd' => [1, 3,]}; # Create sample object $obj = bless {'foo' => 'bar'}, 'Example'; $msg = Dumper($c, $obj); print $msg;
This prints
$VAR1 = { even => [ 2, 4 ], odd => [ 1, 3 ] }; $VAR2 = bless( { foo => 'bar' }, 'Example' );
Data::Dumper assigns an arbitrary variable name to each scalar, which is not really useful if you want to eval it subsequently and recreate your original data. The module allows you to assign your own variable names by using the Dump method:
$a = 100; @b = (2,3); print Data::Dumper->Dump([$a, \@b], ["foo", "*bar"]);
This prints
$foo = 100; @bar = ( 2, 3 );
Dump takes two parameters: a reference to a list of scalars to be dumped and a reference to a list of corresponding names. If a "*" precedes a name, Dump
outputs the appropriate type of the variable. That is, instead of assigning to $b a reference to an anonymous array, it assigns a real list to @b. You can substitute Dumpx for Dump and take advantage of a C extension that implements the same functionality and gives you a speed increase of four to five times.
Data::Dumper gives you an opportunity to specify custom subroutines to serialize and deserialize data, which allows you to smooth the troublesome spots mentioned earlier. Please refer to the documentation for details.
Storable is a C extension module for serializing data directly to files and is the fastest of the three approaches. The store function takes a reference to a data structure (the root) and the name of a file. The retrieve method does the converse: given a filename, it returns the root:
use Storable; $a = [100, 200, {'foo' => 'bar'}]; eval { store($a, 'test.dat'); }; print "Error writing to file: $@" if $@; $a = retrieve('test.dat');
If you have more than one structure to stuff into a file, simply put all of them in an anonymous array and pass this array's reference to store.
You can pass an open filehandle to store_fd instead of giving a filename to store. The functions nstore and nstore_fd can be used for storing the data in "network" order, making the data machine-independent. When you use retrieve or retrieve_fd, the data is automatically converted back to the native machine format (while storing, the module stores a flag indicating whether it has stored it in a machine-independent format or not).