DBM files are a storage management layer that allows programmers to store information in files as pairs of strings, a key, and a value. DBM files are binary files and the key and value strings can also hold binary data.
There are several forms of DBM files, each with its own strengths and weaknesses. Perl supports the ndbm , db , gdbm , sdbm , and odbm managers via the NDBM_File , DB_File , GDBM_File , SDBM_File , and ODBM_File extensions. There's also an AnyDBM_File module that will simply use the best available DBM. The documentation for the AnyDBM_File module includes a useful table comparing the different DBMs.
These extensions all associate a DBM file on disk with a Perl hash variable (or associative array ) in memory.[13] The simple look like a hash programming interface lets programmers store data in operating system files without having to consider how it's done. It just works.
[13]DBM files are implemented by library code that's linked into the Perl extensions. There's no separate server process involved.
Programmers store and fetch values into and out of the hash, and the underlying DBM storage management layer will look after getting them on and off the disk.
In this section, we shall discuss the most popular and sophisticated of these storage managers, the Berkeley Database Manager, also known as the Berkeley DB. This software is accessed from Perl via the DB_File and Berkeley DB extensions. On Windows systems, it can be installed via the Perl package manager, ppm. On Unix systems, it is built by default when Perl is built only if the Berkeley DB library has already been installed on your system. That's generally the case on Linux, but on most other systems you may need to fetch and build the Berkeley DB library first.[14]
[14]Version 1 of Berkeley DB is available from http://www.perl.com/CPAN/src/misc/db.1.86.tar.gz. The much improved Version 2 (e.g., db.2.14.tar.gz) is also available, but isn't needed for our examples and is only supported by recent Perl versions. Version 3 is due out soon. See www.sleepycat.com.
In addition to the standard DBM file features, Berkeley DB and the DB_File module also provide support for several different storage and retrieval algorithms that can be used in subtly different situations. In newer versions of the software, concurrent access to databases and locking are also supported.
Prior to manipulating data within a Berkeley database, either a new database must be created or an existing database must be opened for reading. This can be done by using one of the following function calls:
tie %hash, 'DB_File', $filename, $flags, $mode, $DB_HASH; tie %hash, 'DB_File', $filename, $flags, $mode, $DB_BTREE; tie @array, 'DB_File', $filename, $flags, $mode, $DB_RECNO;
The final parameter of this call is the interesting one, as it dictates the way in which the Berkeley DB will store the data in the database file. The behavior of these parameters is as follows:
DB_HASH is the default behavior for Berkeley DB databases. It stores the data according to a hash value computed from the string specified as the key itself. Hashtables are generally extremely fast, in that by simply applying the hash function to any given key value, the data associated with that key can be located in a single operation. This is much faster than sequential scanning. However, hashtables provide no useful ordering of the data by default, and hashtable performance can begin to degrade when several keys have identical hash key values. This results in several items of data being attached to the same hash key value, which results in slower access times.
With the DB_BTREE format, Berkeley DB files are stored in the form of a balanced binary tree. The B-tree storage technique will sort the keys that you insert into the Berkeley DB, the default being to sort them in lexical order. If you desire, you can override this behavior with your own sorting algorithms.
The DB_RECNO format allows you to store key/value pairs in both fixed-length and variable-length textual flat files. The key values in this case consist of a line number, i.e., the number of the record within the database.
When initializing a new or existing Berkeley DB database for use with Perl, use the tie mechanism defined within Perl to associate the actual Berkeley DB with either a hash or a standard scalar array. By doing this, we can simply manipulate the Perl variables, which will automatically perform the appropriate operations on the Berkeley DB files instead of us having to manually program the Berkeley DB API ourselves.
For example, to create a simple Berkeley DB, we could use the following Perl script:
#!/usr/bin/perl -w # # ch02/DBM/createdb: Creates a Berkeley DB use strict; use DB_File; my %database; tie %database, 'DB_File', "createdb.dat" or die "Can't initialize database: $!\n"; untie %database; exit;
If you now look in the directory in which you ran this script, you should hopefully find a new file called createdb.dat. This is the disk image of your Berkeley database, i.e., your data stored in the format implemented by the Berkeley DB storage manager. These files are commonly referred to as DBM files.
In the example above, we simply specified the name of the file in which the database is to be stored and then ignored the other arguments. This is a perfectly acceptable thing to do if the defaults are satisfactory. The additional arguments default to the values listed in Table 2-1.
Argument |
Default Value |
---|---|
$filename |
undef [15] |
$flags |
O_CREAT | O_RDWR |
$mode |
0666 |
$storage_type |
$DB_HASH |
[15]If the filename argument is specified as undef, the database will be created in-memory only. It still behaves as if written to file, although once the program exits, the database will no longer exist.
The $flags argument takes the values that are associated with the standard Perl sysopen() function, and the $mode argument takes the form of the octal value of the file permissions that you wish the DBM file to be created with. In the case of the default value, 0666, the corresponding Unix permissions will be:
-rw-rw-rw-
That is, the file is user, group, and world readable and writeable.[16] You may wish to specify more strict permissions on your DBM files to be sure that unauthorized users won't tamper with them.
[16] We are ignoring any modifications to the permissions that umask may make.
Other platforms such as Win32 differ, and do not necessarily use a permission system. On these platforms, the permission mode is simply ignored.
Given that creating a new database is a fairly major operation, it might be worthwhile to implement an exclusive locking mechanism that protects the database files while the database is initially created and loaded. As with flat-file databases, the Perl flock() call should be used to perform file-level locking, but there are some differences between locking standard files and DBM files.
The issues of safe access to databases that plagued flat-file databases still apply to Berkeley databases. Therefore, it is a good idea to implement a locking strategy that allows safe multi-user access to the databases, if this is required by your applications.
The way in which flock() is used regarding DBM files is slightly different than that of locking standard Perl filehandles, as there is no direct reference to the underlying filehandle when we create a DBM file within a Perl script.
Fortunately, the DB_File module defines a method that can be used to locate the underlying file descriptor for a DBM file, allowing us to use flock() on it. This can be achieved by invoking the fd() method on the object reference returned from the database initialization by tie(). For example:
### Create the new database ... $db = tie %database, 'DB_File', "megaliths.dat" or die "Can't initialize database: $!\n"; ### Acquire the file descriptor for the DBM file my $fd = $db->fd(); ### Do a careful open() of that descriptor to get a Perl filehandle open DATAFILE, "+<&=$fd" or die "Can't safely open file: $!\n"; ### And lock it before we start loading data ... print "Acquiring an exclusive lock..."; flock( DATAFILE, LOCK_EX ) or die "Unable to acquire exclusive lock: $!. Aborting"; print "Acquired lock. Ready to update database!\n\n";
This code looks a bit gruesome, especially with the additional call to open(). It is written in such a way that the original file descriptor being currently used by the DBM file when the database was created is not invalidated. What actually occurs is that the file descriptor is associated with the Perl filehandle in a nondestructive way. This then allows us to flock() the filehandle as per usual.
However,after having written this description and all the examples using this standard documented way to lock Berkeley DBM files, it has been discovered that there is a small risk of data corruption during concurrent access. To make a long story short, the DBM code reads some of the file when it first opens it, before you get a chance to lock it. That's the problem.
There is a quick fix if your system supports the O_EXLOCK flag, as FreeBSD does and probably most Linux versions do. Just add the O_EXLOCK flag to the tie :
use Fcntl; # import O_EXLOCK, if available $db = tie %database, 'DB_File', "megaliths.dat", O_EXLOCK;
For more information, and a more general workaround, see:
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-09/msg00954.html
and the thread of messages that follows it.
Inserting data into a Berkeley DB using the Perl DB_File module is extremely simple as a result of using a tied hash or tied array . The association of a DBM file and a Perl data structure is created when the database is opened. This allows us to manipulate the contents of the database simply by altering the contents of the Perl data structures.
This system makes it very easy to store data within a DBM file and also abstracts the actual file-related operations for data manipulation away from our scripts. Thus, the Berkeley DB is a higher-level storage manager than the simple flat-file databases discussed earlier in this chapter.
The following script demonstrates the insertion and retrieval of data from a DBM file using a tied hash. This hash has the Perl characteristic of being a key/value pair. That is, values are stored within the hash table against a unique key. This affords extremely fast retrieval and an element of indexed data access as opposed to sequential access. For example:
#!/usr/bin/perl -w # # ch02/DBM/simpleinsert: Creates a Berkeley DB, inserts some test data # and dumps it out again use DB_File; use Fcntl ':flock'; ### Initialize the Berkeley DB my %database; my $db = tie %database, 'DB_File', "simpleinsert.dat", O_CREAT | O_RDWR, 0666 or die "Can't initialize database: $!\n"; my $fd = $db->fd(); open DATAFILE, "+<&=$fd" or die "Can't safely open file: $!\n"; print "Acquiring exclusive lock..."; flock( DATAFILE, LOCK_EX ) or die "Unable to acquire lock: $!. Aborting"; print "Acquired lock. Ready to update database!\n\n"; ### Insert some data rows $database{'Callanish I'} = "This site, commonly known as the "Stonehenge of the North" is in the form of a buckled Celtic cross."; $database{'Avebury'} = "Avebury is a vast, sprawling site that features, amongst other marvels, the largest stone circle in Britain. The henge itself is so large, it almost completely surrounds the village of Avebury."; $database{'Lundin Links'} = "Lundin Links is a megalithic curiosity, featuring 3 gnarled and immensely tall monoliths arranged possibly in a 4-poster design. Each monolith is over 5m tall."; ### Untie the database undef $db; untie %database; ### Close the file descriptor to release the lock close DATAFILE; ### Retie the database to ensure we're reading the stored data $db = tie %database, 'DB_File', "simpleinsert.dat", O_RDWR, 0444 or die "Can't initialize database: $!\n"; ### Only need to lock in shared mode this time because we're not updating ... $fd = $db->fd(); open DATAFILE, "+<&=$fd" or die "Can't safely open file: $!\n"; print "Acquiring shared lock..."; flock( DATAFILE, LOCK_SH ) or die "Unable to acquire lock: $!. Aborting"; print "Acquired lock. Ready to read database!\n\n"; ### Dump the database foreach my $key ( keys %database ) { print "$key\n", ( "=" x ( length( $key ) + 1 ) ), "\n\n"; print "$database{$key}\n\n"; } ### Close the Berkeley DB undef $db; untie %database; ### Close the file descriptor to release the lock close DATAFILE; exit;
When run, this script will generate the following output, indicating that it is indeed retrieving values from a database:
Acquiring exclusive lock...Acquired lock. Ready to update database! Acquiring shared lock...Acquired lock. Ready to read database! Callanish I ============ This site, commonly known as the "Stonehenge of the North" is in the form of a buckled Celtic cross. Avebury ======== Avebury is a vast, sprawling site that features, amongst other marvels, the largest stone circle in Britain. The henge itself is so large, it almost completely surrounds the village of Avebury. Lundin Links ============= Lundin Links is a megalithic curiosity, featuring 3 gnarled and immensely tall monoliths arranged possibly in a 4-poster design. Each monolith is over 5m tall.
You may have noticed that we cheated a little bit in the previous example. We stored only the descriptions of the sites instead of all the information such as the map reference and location. This is the inherent problem with key/value pair databases: you can store only a single value against a given key. You can circumvent this by simply concatenating values into a string and storing that string instead, just like we did using join(), pack(), Data::Dumper, and Storable earlier in this chapter.
This particular form of storage jiggery-pokery can be accomplished in at least two ways.[17] One is to hand-concatenate the data into a string and hand-split it when required. The other is slightly more sophisticated and uses a Perl object encapsulating a megalith to handle, and hide, the packing and unpacking.
[17]As with all Perl things, There's More Than One Way To Do It (a phrase so common with Perl you'll often see it written as TMTOWTDI). We're outlining these ideas here because they dawned on us first. You might come up with something far more outlandish and obscure, or painfully obvious. Such is Perl.
The first technique -- application handling of string joins and splits -- is certainly the most self-contained. This leads us into a small digression.
Self-containment can be beneficial, as it tends to concentrate the logic of a script internally, making things slightly more simple to understand. Unfortunately, this localization can also be a real pain. Take our megalithic database as a good example. In the previous section, we wrote four different Perl scripts to handle the four main data manipulation operations. With localized logic, you're essentially implementing the same storing and extraction code in four different places.
Furthermore, if you decide to change the format of the data, you need to keep four different scripts in sync. Given that it's also likely that you'll add more scripts to perform more specific functions (such as generating web pages) with the appropriate megalithic data from the database, that gives your database more points of potential failure and inevitable corruption.
Getting back to the point, we can fairly simply store complex data in a DBM file by using either join( ) , to create a delimited string, or pack( ) , to make a fixed-length record. join( ) can be used in the following way to produce the desired effect:
### Insert some data rows $database{'Callanish I'} = join( ':', 'Callanish I', 'Callanish, Western Isles', 'NB 213 330', 'Stone Circle', 'Description of Callanish I' ); $database{'Avebury'} = join( ':', 'Avebury', 'Wiltshire', 'SU 103 700', 'Stone Circle and Henge', 'Description of Avebury' ); $database{'Lundin Links'} = join( ':', 'Lundin Links', 'Fife', 'NO 404 027', 'Standing Stones', 'Description of Lundin Links' ); ### Dump the database foreach my $key ( keys %database ) { my ( $name, $location, $mapref, $type, $description ) = split( /:/, $database{$key} ); print "$name\n", ( "=" x length( $name ) ), "\n\n"; print "Location: $location\n"; print "Map Reference: $mapref\n"; print "Description: $description\n\n"; }
The storage of fixed-length records is equally straightforward, but does gobble up space within the database rather quickly. Furthermore, the main rationale for using fixed-length records is often access speed, but when stored within a DBM file, in-place queries and updates simply do not provide any major speed increase.
The code to insert and dump megalithic data using fixed-length records is shown in the following code segment:
### The pack and unpack template. $PACKFORMAT = 'A64 A64 A16 A32 A256'; ### Insert some data rows $database{'Callanish I'} = pack( $PACKFORMAT, 'Callanish I', 'Callanish, Western Isles', 'NB 213 330', 'Stone Circle', 'Description of Callanish I' ); $database{'Avebury'} = pack( $PACKFORMAT, 'Avebury', 'Wiltshire', 'SU 103 700', 'Stone Circle and Henge', 'Description of Avebury' ); $database{'Lundin Links'} = pack( $PACKFORMAT, 'Lundin Links', 'Fife', 'NO 404 027', 'Standing Stones', 'Description of Lundin Links' ); ### Dump the database foreach my $key ( keys %database ) { my ( $name, $location, $mapref, $type, $description ) = unpack( $PACKFORMAT, $database{$key} ); print "$name\n", ( "=" x length( $name ) ), "\n\n"; print "Location: $location\n"; print "Map Reference: $mapref\n"; print "Description: $description\n\n"; }
The actual code to express the storage and retrieval mechanism isn't really much more horrible than the delimited record version, but it does introduce a lot of gibberish in the form of the pack( ) template, which could easily be miskeyed or forgotten about. This also doesn't really solve the problem of localized program logic, and turns maintenance into the aforementioned nightmare.
One solution to both the localized code problem and the problem of storing multiple data values within a single hash key/value pair is to use a Perl object to encapsulate and hide some of the nasty bits.[18]
[18]This is where people tend to get a little confused about Perl. The use of objects, accessor methods, and data hiding are all very object-oriented. By this design, we get to mix the convenience of non-OO programming with the neat bits of OO programming. Traditional OO programmers have been known to make spluttering noises when Perl programmers discuss this sort of thing in public.
The following Perl code defines an object of class Megalith. We can then reuse this packaged object module in all of our programs without having to rewrite any of them, if we change the way the module works:
#!/usr/bin/perl -w # # ch02/DBM/Megalith.pm: A perl class encapsulating a megalith package Megalith; use strict; use Carp; ### Creates a new megalith object and initializes the member fields. sub new { my $class = shift; my ( $name, $location, $mapref, $type, $description ) = @_; my $self = {}; bless $self => $class; ### If we only have one argument, assume we have a string ### containing all the field values in $name and unpack it if ( @_ == 1 ) { $self->unpack( $name ); } else { $self->{name} = $name; $self->{location} = $location; $self->{mapref} = $mapref; $self->{type} = $type; $self->{description} = $description; } return $self; } ### Packs the current field values into a colon delimited record ### and returns it sub pack { my ( $self ) = @_; my $record = join( ':', $self->{name}, $self->{location}, $self->{mapref}, $self->{type}, $self->{description} ); ### Simple check that fields don't contain any colons croak "Record field contains ':' delimiter character" if $record =~ tr/:/:/ != 4; return $record; } ### Unpacks the given string into the member fields sub unpack { my ( $self, $packedString ) = @_; ### Naive split...Assumes no inter-field delimiters my ( $name, $location, $mapref, $type, $description ) = split( ':', $packedString, 5 ); $self->{name} = $name; $self->{location} = $location; $self->{mapref} = $mapref; $self->{type} = $type; $self->{description} = $description; } ### Displays the megalith data sub dump { my ( $self ) = @_; print "$self->{name} ( $self->{type} )\n", "=" x ( length( $self->{name} ) + length( $self->{type} ) + 5 ), "\n"; print "Location: $self->{location}\n"; print "Map Reference: $self->{mapref}\n"; print "Description: $self->{description}\n\n"; } 1;
The record format defined by the module contains the items of data pertaining to each megalithic site that can be queried and manipulated by programs. A new Megalith object can be created from Perl via the new operator, for example:
### Create a new object encapsulating Stonehenge $stonehenge = new Megalith( 'Stonehenge', 'Description of Stonehenge', 'Wiltshire', 'SU 123 400' ); ### Display the name of the site stored within the object ... print "Name: $stonehenge->{name}\n";
It would be extremely nice if these Megalith objects could be stored directly into a DBM file. Let's try a simple piece of code that simply stuffs the object into the hash:
### Create a new object encapsulating Stonehenge $stonehenge = new Megalith( 'Stonehenge', 'Description of Stonehenge', 'Wiltshire', 'SU 123 400' ); ### Store the object within the database hash $database{'Stonehenge'} = $stonehenge; ### Have a look at the entry within the database print "Key: $database{'Stonehenge'}\n";
This generates some slightly odd results, to say the least:
Key: Megalith=HASH(0x80e9aec)
What appears to have happened is that the string describing the reference to the Perl object has been inserted in the Berkeley DB instead of the object itself!
This result is perhaps not surprising, given that the DBM systems are really designed for storing single string values, and there is no innate understanding of how to compact complex objects into a single value. It simply converts all keys and values into strings.
Fortunately, the problem of storing a Perl object can be routed around by packing , or marshalling, all the values of all the Megalith object's fields into a single string, and then inserting that string into the database. Similarly, upon extracting the string from the database, a new Megalith can be allocated and populated by unpacking the string into the appropriate fields.
By using our conveniently defined Megalith class, we can write the following code to do this (note the calling of the pack() method):
$database{'Callanish I'} = new Megalith( 'Callanish I', 'Western Isles', 'NB 213 330', 'Stone Circle', 'Description of Callanish I' )->pack( ); ### Dump the database foreach $key ( keys %database ) { ### Unpack the record into a new megalith object my $megalith = new Megalith( $database{$key} ); ### And display the record $megalith->dump( ); }
The Megalith object has two methods declared within it called pack( ) and unpack( ) . These simply pack all the fields into a single delimited string, and unpack a single string into the appropriate fields of the object as needed. If a Megalith object is created with one of these strings as the sole argument, unpack( ) is called internally, shielding the programmer from the internal details of storage management.
Similarly, the actual way in which the data is packed and unpacked is hidden from the module user. This means that if any database structural changes need to be made, they can be made internally without any maintenance on the database manipulation scripts themselves.
If you read the section on putting complex data into flat files earlier in the chapter, then you'll know that there's more than one way to do it.
So although it's a little more work at the outset, it is actually quite straightforward to store Perl objects (and other complex forms of data) within DBM files.
A final gloss on the Megalith class would be to add accessor methods to allow controlled access to the values stored within each object. That is, the example code listed above contains code that explicitly accesses member variables within the object:
print "Megalith Name: $megalith->{name}\n";
This may cause problems if the internal structure of the Megalith object alters in some way. Also, if you write $megalith->{nme} by mistake, no errors or warnings will be generated. Defining an accessor method called getName( ), such as:
### Returns the name of the megalith sub getName { my ( $self ) = @_; return $self->{name}; }
makes the code arguably more readable:
print "Megalith Name: " . $megalith->getName( ) . "\n";
and also ensures the correctness of the application code, since the actual logic is migrated, once again, into the object.
Even with the functionality of being able to insert complex data into the Berkeley DB file (albeit in a slightly roundabout way), there is still a fundamental limitation of this database software: you can retrieve values via only one key. That is, if you wanted to search our megalithic database, the name, not the map reference or the location, must be used as the search term.
This might be a pretty big problem, given that you might wish to issue a query such as, ``tell me about all the sites in Wiltshire,'' without specifying an exact name. In this case, every record would be tested to see if any fit the bill. This would use a sequential search instead of the indexed access you have when querying against the key.
A solution to this problem is to create secondary referential hashes that have key values for the different fields you might wish to query on. The value stored for each key is actually a reference to the original hash and not to a separate value. This allows you to update the value in the original hash, and the new value is automatically mirrored within the reference hashes. The following snippet shows some code that could be used to create and dump out a referential hash keyed on the location of a megalithic site:
### Build a referential hash based on the location of each monument $locationDatabase{'Wiltshire'} = \$database{'Avebury'}; $locationDatabase{'Western Isles'} = \$database{'Callanish I'}; $locationDatabase{'Fife'} = \$database{'Lundin Links'}; ### Dump the location database foreach $key ( keys %locationDatabase ) { ### Unpack the record into a new megalith object my $megalith = new Megalith( ${ $locationDatabase{$key} } ); ### And display the record $megalith->dump( ); }
There are, of course, a few drawbacks to this particular solution. The most apparent is that any data deletion or insertion would require a mirror operation to be performed on each secondary reference hash.
The biggest problem with this approach is that your data might not have unique keys. If we wished to store records for Stonehenge and Avebury, both of those sites have a location of Wiltshire. In this case, the latest inserted record would always overwrite the earlier records inserted into the hash. To solve this general problem, we can use a feature of Berkeley DB files that allows value chaining.
One of the bigger problems when using a DBM file with the storage mechanism of DB_HASH is that the keys against which the data is stored must be unique. For example, if we stored two different values with the key of ``Wiltshire,'' say for Stonehenge and Avebury, generally the last value inserted into the hash would get stored in the database. This is a bit problematic, to say the least.
In a good database design, the primary key of any data structure generally should be unique in order to speed up searches. But quick and dirty databases, badly designed ones, or databases with a suboptimal data quality may not be able to enforce this uniqueness. Similarly, using referential hashtables to provide nonprimary key searching of the database also triggers this problem.
A Perl solution to this problem is to push the multiple values onto an array that is stored within the hash element. This technique works fine while the program is running, because the array references are still valid, but when the database is written out and reloaded, the data is invalid.
Therefore, to solve this problem, we need to look at using the different Berkeley DB storage management method of DB_BTREE , which orders its keys prior to insertion. With this mechanism, it is possible to have duplicate keys, because the underlying DBM file is in the form of an array rather than a hashtable. Fortunately, you still reference the DBM file via a Perl hashtable, so DB_BTREE is not any harder to use. The main downside to DB_BTREE storage is a penalty in performance, since a B-Tree is generally slightly slower than a hashtable for data retrieval.
The following short program creates a Berkeley DB using the DB_BTREE storage mechanism and also specifies a flag to indicate that duplicate keys are allowed. A number of rows are inserted with duplicate keys, and finally the database is dumped to show that the keys have been stored:
#!/usr/bin/perl -w # # ch02/DBM/dupkey1: Creates a Berkeley DB with the DB_BTREE mechanism and # allows for duplicate keys. We then insert some test # object data with duplicate keys and dump the final # database. use DB_File; use Fcntl ':flock'; use Megalith; ### Set Berkeley DB BTree mode to handle duplicate keys $DB_BTREE->{'flags'} = R_DUP; ### Remove any existing database files unlink 'dupkey2.dat'; ### Open the database up my %database; my $db = tie %database, 'DB_File', "dupkey2.dat", O_CREAT | O_RDWR, 0666, $DB_BTREE or die "Can't initialize database: $!\n"; ### Exclusively lock the database to ensure no one accesses it my $fd = $db->fd( ); open DATAFILE, "+<&=$fd" or die "Can't safely open file: $!\n"; print "Acquiring exclusive lock..."; flock( DATAFILE, LOCK_EX ) or die "Unable to acquire lock: $!. Aborting"; print "Acquired lock. Ready to update database!\n\n"; ### Create, pack and insert some rows with duplicate keys $database{'Wiltshire'} = new Megalith( 'Avebury', 'Wiltshire', 'SU 103 700', 'Stone Circle and Henge', 'Largest stone circle in Britain' )->pack( ); $database{'Wiltshire'} = new Megalith( 'Stonehenge', 'Wiltshire', 'SU 123 400', 'Stone Circle and Henge', 'The most popularly known stone circle in the world' )->pack( ); $database{'Wiltshire'} = new Megalith( 'The Sanctuary', 'Wiltshire', 'SU 118 680', 'Stone Circle ( destroyed )', 'No description available' )->pack( ); ### Dump the database foreach my $key ( keys %database ) { ### Unpack the record into a new megalith object my $megalith = new Megalith( $database{$key} ); ### And display the record $megalith->dump( ); } ### Close the database undef $db; untie %database; ### Close the filehandle to release the lock close DATAFILE; exit;
The output you get from running this program is not exactly what we'd hoped for:
Acquiring exclusive lock...Acquired lock. Ready to update database! The Sanctuary ( Stone Circle ( destroyed ) ) ============================================ Location: Wiltshire Map Reference: SU 118 680 Description: No description available The Sanctuary ( Stone Circle ( destroyed ) ) ============================================ Location: Wiltshire Map Reference: SU 118 680 Description: No description available The Sanctuary ( Stone Circle ( destroyed ) ) ============================================ Location: Wiltshire Map Reference: SU 118 680 Description: No description available
It seems that we've managed to successfully store three copies of the same record instead of three different records!
Fortunately, this isn't actually the case. We have correctly stored the three different records with the same key in the DBM file. The problem lies in the way we've tried to read these records back out of the DBM file. A basic dereference using the hash key obviously doesn't work, since Perl stores only a single value for each key, as we already know.
To get around this limitation, we can use the seq( ) method declared within the DB_File module, which is used to traverse chained records stored within a single hash element. Figure 2-1 illustrates the principle of chained record traversal within a hash element.
The corrected record dumping chunk is rewritten to use seq() in this way:
### Dump the database my ($key, $value, $status) = ('', '', 0); for ( $status = $db->seq( $key, $value, R_FIRST ) ; $status == 0 ; $status = $db->seq( $key, $value, R_NEXT ) ) { ### Unpack the record into a new megalith object my $megalith = new Megalith( $value ); ### And display the record $megalith->dump(); }
Running this corrected version produces the output we expected, i.e., records for three different megalithic sites.
The seq() method is quite simple to use and understand, and it works well when used in conjunction with a for loop, as shown above. The method takes three arguments: the hash key, the hash value, and a flag signifying which element within the chain should be returned. The first two arguments are actually populated with the hash key and the correct hash value, respectively, when seq() is called. Exactly which hash value is returned depends on the value of the third argument:
R_FIRST returns the first record within the chain of records.
R_LAST returns the last record stored within the chain of records.
R_NEXT returns the next record within the chain of records. This is used for forward sequential traversals of the chain.
R_PREV returns the previous record within the chain of records. This is used for backward sequential traversals of the chain.
R_CURSOR returns a record in which a partial match for the key has been located. This allows a certain element of "fuzzy matching" of keys. This feature is not necessarily accurate and may return the closest match to the desired key rather than an exact match. For example, if you searched for all sites within Wiltshire and asked for a partial match against ``wilt'', but no exact records matched, you may be returned the entries for the ``Western Isles,'' as these are the closest to the search term.
In the database dumping example shown above, we are simply starting at the beginning of the record chain and traversing through it in a forward direction. We could have performed a backward search by writing:
for ( $status = $db->seq( $key, $value, R_LAST ) ; $status == 0 ; $status = $db->seq( $key, $value, R_PREV ) ) { ... }
A quicker and easier utility method for querying duplicate values also exists: get_dup() . This method returns either the number of records with the given key or an array or hash containing the appropriate records. For example, given that we have three records in our database with the key of Wiltshire, we could verify that fact by writing:
### Displays the number of records inserted against ### the "Wiltshire" key my $numRecords = $db->get_dup( 'Wiltshire' ); print "Number of Wiltshire records: $numRecords\n";
Deleting values is the final operation that can be performed on DBM files. Updating is as simple as assigning different values to the appropriate key within the database, and deleting is equally simple. This operation is performed by using the standard Perl delete function on the appropriate key within the database. delete removes it from the hash that represents the database, and because the hash has been tied to the DBM file, it is purged from that also.
The following program inserts three records into a Berkeley DB, and then dumps the database to show that the records are there. Following that process, a single record is deleted and the database is redumped to illustrate the deletion. Here's the program:
#!/usr/bin/perl -w # # ch02/DBM/delete: Creates a Berkeley DB, inserts some test data then # deletes some of it use strict; use DB_File; ### Initialize the Berkeley DB my %database; tie %database, 'DB_File', "delete.dat" or die "Can't initialize database: $!\n"; ### Insert some data rows $database{'Callanish I'} = "Western Isles"; $database{'Avebury'} = "Wiltshire"; $database{'Lundin Links'} = "Fife"; ### Dump the database print "Dumping the entire database...\n"; foreach my $key ( keys %database ) { printf "%15s - %s\n", $key, $database{$key}; } print "\n"; ### Delete a row delete $database{'Avebury'}; ### Re-dump the database print "Dumping the database after deletions...\n"; foreach my $key ( keys %database ) { printf "%15s - %s\n", $key, $database{$key}; } ### Close the Berkeley DB untie %database; exit;
The output of this program is as expected:
Dumping the entire database... Callanish I - Western Isles Avebury - Wiltshire Lundin Links - Fife Dumping the database after deletions... Callanish I - Western Isles Lundin Links - Fife
That is, the specified row has been permanently removed from the database by deleting the related hash entry.
Copyright © 2001 O'Reilly & Associates. All rights reserved.