Book HomeProgramming the Perl DBISearch this book

2.7. DBM Files and the BerkeleyDatabase Manager

DBM files are a storage management layer that allows programmers to store information in files as pairs of strings, a key, and a value. DBM files are binary files and the key and value strings can also hold binary data.

There are several forms of DBM files, each with its own strengths and weaknesses. Perl supports the ndbm , db , gdbm , sdbm , and odbm managers via the NDBM_File , DB_File , GDBM_File , SDBM_File , and ODBM_File extensions. There's also an AnyDBM_File module that will simply use the best available DBM. The documentation for the AnyDBM_File module includes a useful table comparing the different DBMs.

These extensions all associate a DBM file on disk with a Perl hash variable (or associative array ) in memory.[13] The simple look like a hash programming interface lets programmers store data in operating system files without having to consider how it's done. It just works.

[13]DBM files are implemented by library code that's linked into the Perl extensions. There's no separate server process involved.

Programmers store and fetch values into and out of the hash, and the underlying DBM storage management layer will look after getting them on and off the disk.

In this section, we shall discuss the most popular and sophisticated of these storage managers, the Berkeley Database Manager, also known as the Berkeley DB. This software is accessed from Perl via the DB_File and Berkeley DB extensions. On Windows systems, it can be installed via the Perl package manager, ppm. On Unix systems, it is built by default when Perl is built only if the Berkeley DB library has already been installed on your system. That's generally the case on Linux, but on most other systems you may need to fetch and build the Berkeley DB library first.[14]

[14]Version 1 of Berkeley DB is available from http://www.perl.com/CPAN/src/misc/db.1.86.tar.gz. The much improved Version 2 (e.g., db.2.14.tar.gz) is also available, but isn't needed for our examples and is only supported by recent Perl versions. Version 3 is due out soon. See www.sleepycat.com.

In addition to the standard DBM file features, Berkeley DB and the DB_File module also provide support for several different storage and retrieval algorithms that can be used in subtly different situations. In newer versions of the software, concurrent access to databases and locking are also supported.

2.7.1. Creating a New Database

Prior to manipulating data within a Berkeley database, either a new database must be created or an existing database must be opened for reading. This can be done by using one of the following function calls:

tie %hash,  'DB_File', $filename, $flags, $mode, $DB_HASH;
tie %hash,  'DB_File', $filename, $flags, $mode, $DB_BTREE;
tie @array, 'DB_File', $filename, $flags, $mode, $DB_RECNO;

The final parameter of this call is the interesting one, as it dictates the way in which the Berkeley DB will store the data in the database file. The behavior of these parameters is as follows:

When initializing a new or existing Berkeley DB database for use with Perl, use the tie mechanism defined within Perl to associate the actual Berkeley DB with either a hash or a standard scalar array. By doing this, we can simply manipulate the Perl variables, which will automatically perform the appropriate operations on the Berkeley DB files instead of us having to manually program the Berkeley DB API ourselves.

For example, to create a simple Berkeley DB, we could use the following Perl script:

#!/usr/bin/perl -w
#
# ch02/DBM/createdb: Creates a Berkeley DB

use strict;

use DB_File;

my  %database;
tie %database, 'DB_File', "createdb.dat"
    or die "Can't initialize database: $!\n";

untie %database;

exit;

If you now look in the directory in which you ran this script, you should hopefully find a new file called createdb.dat. This is the disk image of your Berkeley database, i.e., your data stored in the format implemented by the Berkeley DB storage manager. These files are commonly referred to as DBM files.

In the example above, we simply specified the name of the file in which the database is to be stored and then ignored the other arguments. This is a perfectly acceptable thing to do if the defaults are satisfactory. The additional arguments default to the values listed in Table 2-1.

Table 2-1. The Default Argument Values of DB_File

Argument

Default Value

$filename

undef [15]

$flags

O_CREAT | O_RDWR

$mode

0666

$storage_type

$DB_HASH

[15]If the filename argument is specified as undef, the database will be created in-memory only. It still behaves as if written to file, although once the program exits, the database will no longer exist.

The $flags argument takes the values that are associated with the standard Perl sysopen() function, and the $mode argument takes the form of the octal value of the file permissions that you wish the DBM file to be created with. In the case of the default value, 0666, the corresponding Unix permissions will be:

-rw-rw-rw-

That is, the file is user, group, and world readable and writeable.[16] You may wish to specify more strict permissions on your DBM files to be sure that unauthorized users won't tamper with them.

[16] We are ignoring any modifications to the permissions that umask may make.

Other platforms such as Win32 differ, and do not necessarily use a permission system. On these platforms, the permission mode is simply ignored.

Given that creating a new database is a fairly major operation, it might be worthwhile to implement an exclusive locking mechanism that protects the database files while the database is initially created and loaded. As with flat-file databases, the Perl flock() call should be used to perform file-level locking, but there are some differences between locking standard files and DBM files.

2.7.2. Locking Strategies

The issues of safe access to databases that plagued flat-file databases still apply to Berkeley databases. Therefore, it is a good idea to implement a locking strategy that allows safe multi-user access to the databases, if this is required by your applications.

The way in which flock() is used regarding DBM files is slightly different than that of locking standard Perl filehandles, as there is no direct reference to the underlying filehandle when we create a DBM file within a Perl script.

Fortunately, the DB_File module defines a method that can be used to locate the underlying file descriptor for a DBM file, allowing us to use flock() on it. This can be achieved by invoking the fd() method on the object reference returned from the database initialization by tie(). For example:

### Create the new database ...
$db = tie %database, 'DB_File', "megaliths.dat"
    or die "Can't initialize database: $!\n";

### Acquire the file descriptor for the DBM file
my $fd = $db->fd();

### Do a careful open() of that descriptor to get a Perl filehandle
open DATAFILE, "+<&=$fd" or die "Can't safely open file: $!\n";

### And lock it before we start loading data ...
print "Acquiring an exclusive lock...";
flock( DATAFILE, LOCK_EX )
        or die "Unable to acquire exclusive lock: $!. Aborting";
print "Acquired lock. Ready to update database!\n\n";

This code looks a bit gruesome, especially with the additional call to open(). It is written in such a way that the original file descriptor being currently used by the DBM file when the database was created is not invalidated. What actually occurs is that the file descriptor is associated with the Perl filehandle in a nondestructive way. This then allows us to flock() the filehandle as per usual.

However,after having written this description and all the examples using this standard documented way to lock Berkeley DBM files, it has been discovered that there is a small risk of data corruption during concurrent access. To make a long story short, the DBM code reads some of the file when it first opens it, before you get a chance to lock it. That's the problem.

There is a quick fix if your system supports the O_EXLOCK flag, as FreeBSD does and probably most Linux versions do. Just add the O_EXLOCK flag to the tie :

use Fcntl;    # import O_EXLOCK, if available
$db = tie %database, 'DB_File', "megaliths.dat", O_EXLOCK;

For more information, and a more general workaround, see:

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-09/msg00954.html

and the thread of messages that follows it.

2.7.3. Inserting and Retrieving Values

Inserting data into a Berkeley DB using the Perl DB_File module is extremely simple as a result of using a tied hash or tied array . The association of a DBM file and a Perl data structure is created when the database is opened. This allows us to manipulate the contents of the database simply by altering the contents of the Perl data structures.

This system makes it very easy to store data within a DBM file and also abstracts the actual file-related operations for data manipulation away from our scripts. Thus, the Berkeley DB is a higher-level storage manager than the simple flat-file databases discussed earlier in this chapter.

The following script demonstrates the insertion and retrieval of data from a DBM file using a tied hash. This hash has the Perl characteristic of being a key/value pair. That is, values are stored within the hash table against a unique key. This affords extremely fast retrieval and an element of indexed data access as opposed to sequential access. For example:

#!/usr/bin/perl -w
#
# ch02/DBM/simpleinsert: Creates a Berkeley DB, inserts some test data
#                        and dumps it out again

use DB_File;
use Fcntl ':flock';

### Initialize the Berkeley DB
my %database;
my $db = tie %database, 'DB_File', "simpleinsert.dat",
         O_CREAT | O_RDWR, 0666
    or die "Can't initialize database: $!\n";

my $fd = $db->fd();
open DATAFILE, "+<&=$fd"
    or die "Can't safely open file: $!\n";
print "Acquiring exclusive lock...";
flock( DATAFILE, LOCK_EX )
    or die "Unable to acquire lock: $!. Aborting";
print "Acquired lock. Ready to update database!\n\n";


### Insert some data rows
$database{'Callanish I'} = 
    "This site, commonly known as the "Stonehenge of the North" is in the
form of a buckled Celtic cross.";

$database{'Avebury'} =
    "Avebury is a vast, sprawling site that features, amongst other marvels,
the largest stone circle in Britain. The henge itself is so large,
it almost completely surrounds the village of Avebury.";
$database{'Lundin Links'} =
    "Lundin Links is a megalithic curiosity, featuring 3 gnarled and
immensely tall monoliths arranged possibly in a 4-poster design.
Each monolith is over 5m tall.";

### Untie the database
undef $db;
untie %database;

### Close the file descriptor to release the lock
close DATAFILE;


### Retie the database to ensure we're reading the stored data
$db = tie %database, 'DB_File', "simpleinsert.dat", O_RDWR, 0444
    or die "Can't initialize database: $!\n";

### Only need to lock in shared mode this time because we're not updating ...
$fd = $db->fd();
open DATAFILE, "+<&=$fd" or die "Can't safely open file: $!\n";
print "Acquiring shared lock...";
flock( DATAFILE, LOCK_SH )
    or die "Unable to acquire lock: $!. Aborting";
print "Acquired lock. Ready to read database!\n\n";

### Dump the database
foreach my $key ( keys %database ) {
    print "$key\n", ( "=" x ( length( $key ) + 1 ) ), "\n\n";
    print "$database{$key}\n\n";
}

### Close the Berkeley DB
undef $db;
untie %database;

### Close the file descriptor to release the lock
close DATAFILE;

exit;

When run, this script will generate the following output, indicating that it is indeed retrieving values from a database:

Acquiring exclusive lock...Acquired lock. Ready to update database!

Acquiring shared lock...Acquired lock. Ready to read database!

Callanish I
============

This site, commonly known as the "Stonehenge of the North" is in the
form of a buckled Celtic cross.

Avebury
========

Avebury is a vast, sprawling site that features, amongst other marvels,
the largest stone circle in Britain. The henge itself is so large,
it almost completely surrounds the village of Avebury.

Lundin Links
=============

Lundin Links is a megalithic curiosity, featuring 3 gnarled and
immensely tall monoliths arranged possibly in a 4-poster design.
Each monolith is over 5m tall.

You may have noticed that we cheated a little bit in the previous example. We stored only the descriptions of the sites instead of all the information such as the map reference and location. This is the inherent problem with key/value pair databases: you can store only a single value against a given key. You can circumvent this by simply concatenating values into a string and storing that string instead, just like we did using join(), pack(), Data::Dumper, and Storable earlier in this chapter.

This particular form of storage jiggery-pokery can be accomplished in at least two ways.[17] One is to hand-concatenate the data into a string and hand-split it when required. The other is slightly more sophisticated and uses a Perl object encapsulating a megalith to handle, and hide, the packing and unpacking.

[17]As with all Perl things, There's More Than One Way To Do It (a phrase so common with Perl you'll often see it written as TMTOWTDI). We're outlining these ideas here because they dawned on us first. You might come up with something far more outlandish and obscure, or painfully obvious. Such is Perl.

2.7.3.1. Localized storage and retrieval

The first technique -- application handling of string joins and splits -- is certainly the most self-contained. This leads us into a small digression.

Self-containment can be beneficial, as it tends to concentrate the logic of a script internally, making things slightly more simple to understand. Unfortunately, this localization can also be a real pain. Take our megalithic database as a good example. In the previous section, we wrote four different Perl scripts to handle the four main data manipulation operations. With localized logic, you're essentially implementing the same storing and extraction code in four different places.

Furthermore, if you decide to change the format of the data, you need to keep four different scripts in sync. Given that it's also likely that you'll add more scripts to perform more specific functions (such as generating web pages) with the appropriate megalithic data from the database, that gives your database more points of potential failure and inevitable corruption.

Getting back to the point, we can fairly simply store complex data in a DBM file by using either join( ) , to create a delimited string, or pack( ) , to make a fixed-length record. join( ) can be used in the following way to produce the desired effect:

### Insert some data rows
$database{'Callanish I'} =
    join( ':', 'Callanish I', 'Callanish, Western Isles', 'NB 213 330',
               'Stone Circle', 'Description of Callanish I' );
$database{'Avebury'} =
    join( ':', 'Avebury', 'Wiltshire', 'SU 103 700', 
               'Stone Circle and Henge',
               'Description of Avebury' );
$database{'Lundin Links'} =
    join( ':', 'Lundin Links', 'Fife', 'NO 404 027', 'Standing Stones',
               'Description of Lundin Links' );
### Dump the database
foreach my $key ( keys %database ) {
    my ( $name, $location, $mapref, $type, $description ) =
         split( /:/, $database{$key} );
    print "$name\n", ( "=" x length( $name ) ), "\n\n";
    print "Location:      $location\n";
    print "Map Reference: $mapref\n";
    print "Description:   $description\n\n";
}

The storage of fixed-length records is equally straightforward, but does gobble up space within the database rather quickly. Furthermore, the main rationale for using fixed-length records is often access speed, but when stored within a DBM file, in-place queries and updates simply do not provide any major speed increase.

The code to insert and dump megalithic data using fixed-length records is shown in the following code segment:

### The pack and unpack template.
$PACKFORMAT = 'A64 A64 A16 A32 A256';

### Insert some data rows
$database{'Callanish I'} =
    pack( $PACKFORMAT, 'Callanish I', 'Callanish, Western Isles', 
                     'NB 213 330', 'Stone Circle', 
                     'Description of Callanish I' );

$database{'Avebury'} =
    pack( $PACKFORMAT, 'Avebury', 'Wiltshire', 'SU 103 700', 
                     'Stone Circle and Henge', 'Description of Avebury' );

$database{'Lundin Links'} =
    pack( $PACKFORMAT, 'Lundin Links', 'Fife', 'NO 404 027', 
                     'Standing Stones', 
                     'Description of Lundin Links' );

### Dump the database
foreach my $key ( keys %database ) {
    my ( $name, $location, $mapref, $type, $description ) =
        unpack( $PACKFORMAT, $database{$key} );
    print "$name\n", ( "=" x length( $name ) ), "\n\n";
    print "Location:      $location\n";
    print "Map Reference: $mapref\n";
    print "Description:   $description\n\n";
}

The actual code to express the storage and retrieval mechanism isn't really much more horrible than the delimited record version, but it does introduce a lot of gibberish in the form of the pack( ) template, which could easily be miskeyed or forgotten about. This also doesn't really solve the problem of localized program logic, and turns maintenance into the aforementioned nightmare.

How can we improve on this?

2.7.3.2. Packing in Perl objects

One solution to both the localized code problem and the problem of storing multiple data values within a single hash key/value pair is to use a Perl object to encapsulate and hide some of the nasty bits.[18]

[18]This is where people tend to get a little confused about Perl. The use of objects, accessor methods, and data hiding are all very object-oriented. By this design, we get to mix the convenience of non-OO programming with the neat bits of OO programming. Traditional OO programmers have been known to make spluttering noises when Perl programmers discuss this sort of thing in public.

The following Perl code defines an object of class Megalith. We can then reuse this packaged object module in all of our programs without having to rewrite any of them, if we change the way the module works:

#!/usr/bin/perl -w
  #
  # ch02/DBM/Megalith.pm: A perl class encapsulating a megalith
  
  package Megalith;
  
  use strict;
  use Carp;
  
  ### Creates a new megalith object and initializes the member fields.
  sub new {
      my $class = shift;
      my ( $name, $location, $mapref, $type, $description ) = @_;
      my $self = {};
      bless $self => $class;
      
      ### If we only have one argument, assume we have a string
      ### containing all the field values in $name and unpack it
      if ( @_ == 1 ) {
          $self->unpack( $name );
      }
      else {
          $self->{name} = $name;
          $self->{location} = $location;
          $self->{mapref} = $mapref;
          $self->{type} = $type;
          $self->{description} = $description;
      }
      return $self;
  }
  
  ### Packs the current field values into a colon delimited record
  ### and returns it
  sub pack {
      my ( $self ) = @_;
      
      my $record = join( ':', $self->{name}, $self->{location},
                              $self->{mapref}, $self->{type},
                              $self->{description} );
      
      ### Simple check that fields don't contain any colons
      croak "Record field contains ':' delimiter character"
          if $record =~ tr/:/:/ != 4;
      
      return $record;
  }
  
  ### Unpacks the given string into the member fields
  sub unpack {
      my ( $self, $packedString ) = @_;
      
      ### Naive split...Assumes no inter-field delimiters
      my ( $name, $location, $mapref, $type, $description ) =
           split( ':', $packedString, 5 );
      
      $self->{name} = $name;
      $self->{location} = $location;
      $self->{mapref} = $mapref;
      $self->{type} = $type;
      $self->{description} = $description;
  }
  
  ### Displays the megalith data
  sub dump {
      my ( $self ) = @_;
      
      print "$self->{name} ( $self->{type} )\n", 
            "=" x ( length( $self->{name} ) + 
                    length( $self->{type} ) + 5 ), "\n";
      print "Location:      $self->{location}\n"; 
      print "Map Reference: $self->{mapref}\n";
      print "Description:   $self->{description}\n\n";
  }
  
  1;

The record format defined by the module contains the items of data pertaining to each megalithic site that can be queried and manipulated by programs. A new Megalith object can be created from Perl via the new operator, for example:

### Create a new object encapsulating Stonehenge
$stonehenge = 
    new Megalith( 'Stonehenge', 'Description of Stonehenge',
                  'Wiltshire', 'SU 123 400' );

### Display the name of the site stored within the object ...
print "Name: $stonehenge->{name}\n";

It would be extremely nice if these Megalith objects could be stored directly into a DBM file. Let's try a simple piece of code that simply stuffs the object into the hash:

### Create a new object encapsulating Stonehenge
$stonehenge = 
    new Megalith( 'Stonehenge', 'Description of Stonehenge',
                  'Wiltshire', 'SU 123 400' );

### Store the object within the database hash
$database{'Stonehenge'} = $stonehenge;

### Have a look at the entry within the database
print "Key: $database{'Stonehenge'}\n";

This generates some slightly odd results, to say the least:

Key: Megalith=HASH(0x80e9aec)

What appears to have happened is that the string describing the reference to the Perl object has been inserted in the Berkeley DB instead of the object itself!

This result is perhaps not surprising, given that the DBM systems are really designed for storing single string values, and there is no innate understanding of how to compact complex objects into a single value. It simply converts all keys and values into strings.

Fortunately, the problem of storing a Perl object can be routed around by packing , or marshalling, all the values of all the Megalith object's fields into a single string, and then inserting that string into the database. Similarly, upon extracting the string from the database, a new Megalith can be allocated and populated by unpacking the string into the appropriate fields.

By using our conveniently defined Megalith class, we can write the following code to do this (note the calling of the pack() method):

$database{'Callanish I'} =
    new Megalith( 'Callanish I', 
                  'Western Isles',
                  'NB 213 330',
                  'Stone Circle',
                  'Description of Callanish I' )->pack(  );

### Dump the database
foreach $key ( keys %database ) {

    ### Unpack the record into a new megalith object
    my $megalith = new Megalith( $database{$key} );

    ### And display the record
    $megalith->dump(  );
}

The Megalith object has two methods declared within it called pack( ) and unpack( ) . These simply pack all the fields into a single delimited string, and unpack a single string into the appropriate fields of the object as needed. If a Megalith object is created with one of these strings as the sole argument, unpack( ) is called internally, shielding the programmer from the internal details of storage management.

Similarly, the actual way in which the data is packed and unpacked is hidden from the module user. This means that if any database structural changes need to be made, they can be made internally without any maintenance on the database manipulation scripts themselves.

If you read the section on putting complex data into flat files earlier in the chapter, then you'll know that there's more than one way to do it.

So although it's a little more work at the outset, it is actually quite straightforward to store Perl objects (and other complex forms of data) within DBM files.

2.7.3.3. Object accessor methods

A final gloss on the Megalith class would be to add accessor methods to allow controlled access to the values stored within each object. That is, the example code listed above contains code that explicitly accesses member variables within the object:

print "Megalith Name: $megalith->{name}\n";

This may cause problems if the internal structure of the Megalith object alters in some way. Also, if you write $megalith->{nme} by mistake, no errors or warnings will be generated. Defining an accessor method called getName( ), such as:

### Returns the name of the megalith
sub getName {
    my ( $self ) = @_;
    return $self->{name};
}

makes the code arguably more readable:

print "Megalith Name: " . $megalith->getName(  ) . "\n";

and also ensures the correctness of the application code, since the actual logic is migrated, once again, into the object.

2.7.3.4. Querying limitations of DBM files and hashtables

Even with the functionality of being able to insert complex data into the Berkeley DB file (albeit in a slightly roundabout way), there is still a fundamental limitation of this database software: you can retrieve values via only one key. That is, if you wanted to search our megalithic database, the name, not the map reference or the location, must be used as the search term.

This might be a pretty big problem, given that you might wish to issue a query such as, ``tell me about all the sites in Wiltshire,'' without specifying an exact name. In this case, every record would be tested to see if any fit the bill. This would use a sequential search instead of the indexed access you have when querying against the key.

A solution to this problem is to create secondary referential hashes that have key values for the different fields you might wish to query on. The value stored for each key is actually a reference to the original hash and not to a separate value. This allows you to update the value in the original hash, and the new value is automatically mirrored within the reference hashes. The following snippet shows some code that could be used to create and dump out a referential hash keyed on the location of a megalithic site:

### Build a referential hash based on the location of each monument
$locationDatabase{'Wiltshire'}     = \$database{'Avebury'};
$locationDatabase{'Western Isles'} = \$database{'Callanish I'};
$locationDatabase{'Fife'}          = \$database{'Lundin Links'};

### Dump the location database
foreach $key ( keys %locationDatabase ) {

    ### Unpack the record into a new megalith object
    my $megalith = new Megalith( ${ $locationDatabase{$key} } );

    ### And display the record
    $megalith->dump(  );
}

There are, of course, a few drawbacks to this particular solution. The most apparent is that any data deletion or insertion would require a mirror operation to be performed on each secondary reference hash.

The biggest problem with this approach is that your data might not have unique keys. If we wished to store records for Stonehenge and Avebury, both of those sites have a location of Wiltshire. In this case, the latest inserted record would always overwrite the earlier records inserted into the hash. To solve this general problem, we can use a feature of Berkeley DB files that allows value chaining.

2.7.3.5. Chaining multiple values into a hash

One of the bigger problems when using a DBM file with the storage mechanism of DB_HASH is that the keys against which the data is stored must be unique. For example, if we stored two different values with the key of ``Wiltshire,'' say for Stonehenge and Avebury, generally the last value inserted into the hash would get stored in the database. This is a bit problematic, to say the least.

In a good database design, the primary key of any data structure generally should be unique in order to speed up searches. But quick and dirty databases, badly designed ones, or databases with a suboptimal data quality may not be able to enforce this uniqueness. Similarly, using referential hashtables to provide nonprimary key searching of the database also triggers this problem.

A Perl solution to this problem is to push the multiple values onto an array that is stored within the hash element. This technique works fine while the program is running, because the array references are still valid, but when the database is written out and reloaded, the data is invalid.

Therefore, to solve this problem, we need to look at using the different Berkeley DB storage management method of DB_BTREE , which orders its keys prior to insertion. With this mechanism, it is possible to have duplicate keys, because the underlying DBM file is in the form of an array rather than a hashtable. Fortunately, you still reference the DBM file via a Perl hashtable, so DB_BTREE is not any harder to use. The main downside to DB_BTREE storage is a penalty in performance, since a B-Tree is generally slightly slower than a hashtable for data retrieval.

The following short program creates a Berkeley DB using the DB_BTREE storage mechanism and also specifies a flag to indicate that duplicate keys are allowed. A number of rows are inserted with duplicate keys, and finally the database is dumped to show that the keys have been stored:

#!/usr/bin/perl -w
  #
  # ch02/DBM/dupkey1: Creates a Berkeley DB with the DB_BTREE mechanism and 
  #                   allows for duplicate keys. We then insert some test 
  #                   object data with duplicate keys and dump the final
  #                   database.
  
  use DB_File;
  use Fcntl ':flock';
  use Megalith;
  
  ### Set Berkeley DB BTree mode to handle duplicate keys
  $DB_BTREE->{'flags'} = R_DUP;
  
  ### Remove any existing database files
  unlink 'dupkey2.dat';
  
  ### Open the database up
  my %database;
  my $db = tie %database, 'DB_File', "dupkey2.dat", 
                 O_CREAT | O_RDWR, 0666, $DB_BTREE
      or die "Can't initialize database: $!\n";
  
  ### Exclusively lock the database to ensure no one accesses it
  my $fd = $db->fd(  );
  open DATAFILE, "+<&=$fd"
      or die "Can't safely open file: $!\n";
  print "Acquiring exclusive lock...";
  flock( DATAFILE, LOCK_EX )
      or die "Unable to acquire lock: $!. Aborting";
  print "Acquired lock. Ready to update database!\n\n";
  
  ### Create, pack and insert some rows with duplicate keys
  $database{'Wiltshire'} = 
    new Megalith( 'Avebury',
                  'Wiltshire',
                  'SU 103 700',
                  'Stone Circle and Henge',
                  'Largest stone circle in Britain' )->pack(  );
  
  $database{'Wiltshire'} =
    new Megalith( 'Stonehenge',
                  'Wiltshire',
                  'SU 123 400',
                  'Stone Circle and Henge',
                  'The most popularly known stone circle in the world' )->pack(  );
  
  $database{'Wiltshire'} =
    new Megalith( 'The Sanctuary',
                  'Wiltshire',
                  'SU 118 680',
                  'Stone Circle ( destroyed )',
                  'No description available' )->pack(  );
  
  ### Dump the database
  foreach my $key ( keys %database ) {
  
      ### Unpack the record into a new megalith object
      my $megalith = new Megalith( $database{$key} );
      
      ### And display the record
      $megalith->dump(  );
  }
  
  ### Close the database
  undef $db;
  untie %database;
  
  ### Close the filehandle to release the lock
  close DATAFILE;
  
  exit;

The output you get from running this program is not exactly what we'd hoped for:

Acquiring exclusive lock...Acquired lock. Ready to update database!
  
  The Sanctuary ( Stone Circle ( destroyed ) )
  ============================================
  Location:      Wiltshire
  Map Reference: SU 118 680
  Description:   No description available
  
  The Sanctuary ( Stone Circle ( destroyed ) )
  ============================================
  Location:      Wiltshire
  Map Reference: SU 118 680
  Description:   No description available
  
  The Sanctuary ( Stone Circle ( destroyed ) )
  ============================================
  Location:      Wiltshire
  Map Reference: SU 118 680
  Description:   No description available

It seems that we've managed to successfully store three copies of the same record instead of three different records!

Fortunately, this isn't actually the case. We have correctly stored the three different records with the same key in the DBM file. The problem lies in the way we've tried to read these records back out of the DBM file. A basic dereference using the hash key obviously doesn't work, since Perl stores only a single value for each key, as we already know.

To get around this limitation, we can use the seq( ) method declared within the DB_File module, which is used to traverse chained records stored within a single hash element. Figure 2-1 illustrates the principle of chained record traversal within a hash element.

Figure 2-1

Figure 2-1. Chained record traversal

The corrected record dumping chunk is rewritten to use seq() in this way:

### Dump the database
my ($key, $value, $status) = ('', '', 0);
for ( $status = $db->seq( $key, $value, R_FIRST ) ;
      $status == 0 ;
      $status = $db->seq( $key, $value, R_NEXT ) ) {

    ### Unpack the record into a new megalith object
    my $megalith = new Megalith( $value );

    ### And display the record
    $megalith->dump();
}

Running this corrected version produces the output we expected, i.e., records for three different megalithic sites.

The seq() method is quite simple to use and understand, and it works well when used in conjunction with a for loop, as shown above. The method takes three arguments: the hash key, the hash value, and a flag signifying which element within the chain should be returned. The first two arguments are actually populated with the hash key and the correct hash value, respectively, when seq() is called. Exactly which hash value is returned depends on the value of the third argument:

  • R_FIRST returns the first record within the chain of records.

  • R_LAST returns the last record stored within the chain of records.

  • R_NEXT returns the next record within the chain of records. This is used for forward sequential traversals of the chain.

  • R_PREV returns the previous record within the chain of records. This is used for backward sequential traversals of the chain.

  • R_CURSOR returns a record in which a partial match for the key has been located. This allows a certain element of "fuzzy matching" of keys. This feature is not necessarily accurate and may return the closest match to the desired key rather than an exact match. For example, if you searched for all sites within Wiltshire and asked for a partial match against ``wilt'', but no exact records matched, you may be returned the entries for the ``Western Isles,'' as these are the closest to the search term.

In the database dumping example shown above, we are simply starting at the beginning of the record chain and traversing through it in a forward direction. We could have performed a backward search by writing:

for ( $status = $db->seq( $key, $value, R_LAST ) ;
      $status == 0 ;
      $status = $db->seq( $key, $value, R_PREV ) ) {
    ...
}

A quicker and easier utility method for querying duplicate values also exists: get_dup() . This method returns either the number of records with the given key or an array or hash containing the appropriate records. For example, given that we have three records in our database with the key of Wiltshire, we could verify that fact by writing:

### Displays the number of records inserted against 
### the "Wiltshire" key
my $numRecords = $db->get_dup( 'Wiltshire' );
print "Number of Wiltshire records: $numRecords\n";

2.7.4. Deleting Values

Deleting values is the final operation that can be performed on DBM files. Updating is as simple as assigning different values to the appropriate key within the database, and deleting is equally simple. This operation is performed by using the standard Perl delete function on the appropriate key within the database. delete removes it from the hash that represents the database, and because the hash has been tied to the DBM file, it is purged from that also.

The following program inserts three records into a Berkeley DB, and then dumps the database to show that the records are there. Following that process, a single record is deleted and the database is redumped to illustrate the deletion. Here's the program:

#!/usr/bin/perl -w
#
# ch02/DBM/delete: Creates a Berkeley DB, inserts some test data then
#                  deletes some of it

use strict;

use DB_File;

### Initialize the Berkeley DB
my  %database;
tie %database, 'DB_File', "delete.dat"
    or die "Can't initialize database: $!\n";

### Insert some data rows
$database{'Callanish I'}  = "Western Isles";
$database{'Avebury'}      = "Wiltshire";
$database{'Lundin Links'} = "Fife";

### Dump the database
print "Dumping the entire database...\n";
foreach my $key ( keys %database ) {
    printf "%15s - %s\n", $key, $database{$key};
}
print "\n";

### Delete a row
delete $database{'Avebury'};

### Re-dump the database
print "Dumping the database after deletions...\n";
foreach my $key ( keys %database ) {
    printf "%15s - %s\n", $key, $database{$key};
}

### Close the Berkeley DB
untie %database;

exit;

The output of this program is as expected:

Dumping the entire database...
     Callanish I - Western Isles
         Avebury - Wiltshire
    Lundin Links - Fife
  
Dumping the database after deletions...
     Callanish I - Western Isles
    Lundin Links - Fife

That is, the specified row has been permanently removed from the database by deleting the related hash entry.



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.