[Chapter 7] 7.2.8 DB_File - Access to Berkeley DB

7.2.8 DB_File - Access to Berkeley DB

use DB_File;

# brackets in following code indicate optional arguments
[$X =] tie %hash,  "DB_File", $filename [, $flags, $mode, $DB_HASH];
[$X =] tie %hash,  "DB_File", $filename, $flags, $mode, $DB_BTREE;
[$X =] tie @array, "DB_File", $filename, $flags, $mode, $DB_RECNO;

$status = $X->del($key [, $flags]);
$status = $X->put($key, $value [, $flags]);
$status = $X->get($key, $value [, $flags]);
$status = $X->seq($key, $value [, $flags]);
$status = $X->sync([$flags]);
$status = $X->fd;

untie %hash;
untie @array;

DB_File is the most flexible of the DBM-style tie modules. It allows Perl programs to make use of the facilities provided by Berkeley DB (not included). If you intend to use this module you should really have a copy of the Berkeley DB manual page at hand. The interface defined here mirrors the Berkeley DB interface closely.

Berkeley DB is a C library that provides a consistent interface to a number of database formats. DB_File provides an interface to all three of the database (file) types currently supported by Berkeley DB.

The file types are:

DB_HASH: Allows arbitrary key/data pairs to be stored in data files. This is equivalent to the functionality provided by other hashing packages like DBM, NDBM, ODBM, GDBM, and SDBM. Remember, though, the files created using DB_HASH are not binary compatible with any of the other packages mentioned. A default hashing algorithm that will be adequate for most applications is built into Berkeley DB. If you do need to use your own hashing algorithm, it's possible to write your own and have DB_File use it instead.
DB_BTREE: The btree format allows arbitrary key/data pairs to be stored in a sorted, balanced binary tree. It is possible to provide a user-defined Perl routine to perform the comparison of keys. By default, though, the keys are stored in lexical order. This is useful for providing an ordering for your hash keys, and may be used on hashes that are only in memory and never go to disk.
DB_RECNO: DB_RECNO allows both fixed-length and variable-length flat text files to be manipulated using the same key/value pair interface as in DB_HASH and DB_BTREE. In this case the key will consist of a record (line) number.

7.2.8.1 How does DB_File interface to Berkeley DB?

DB_File gives access to Berkeley DB files using Perl's tie function. This allows DB_File to access Berkeley DB files using either a hash (for DB_HASH and DB_BTREE file types) or an ordinary array (for the DB_RECNO file type).

In addition to the tie interface, it is also possible to use most of the functions provided in the Berkeley DB API.

7.2.8.2 Differences from Berkeley DB

Berkeley DB uses the function dbopen(3) to open or create a database. Below is the C prototype for dbopen(3).

DB *
dbopen (const char *file, int flags, int mode,
        DBTYPE type, const void *openinfo)

The type parameter is an enumeration selecting one of the three interface methods, DB_HASH, DB_BTREE or DB_RECNO. Depending on which of these is actually chosen, the final parameter, openinfo, points to a data structure that allows tailoring of the specific interface method.

This interface is handled slightly differently in DB_File. Here is an equivalent call using DB_File.

tie %array, "DB_File", $filename, $flags, $mode, $DB_HASH;

The filename, flags, and mode parameters are the direct equivalent of their dbopen(3) counterparts. The final parameter $DB_HASH performs the function of both the type and openinfo parameters in dbopen(3).

In the example above $DB_HASH is actually a reference to a hash object. DB_File has three of these predefined references. Apart from $DB_HASH, there are also $DB_BTREE and $DB_RECNO.

The keys allowed in each of these predefined references are limited to the names used in the equivalent C structure. So, for example, the $DB_HASH reference will only allow keys called bsize, cachesize, ffactor, hash, lorder, and nelem.

To change one of these elements, just assign to it like this:

$DB_HASH->{cachesize} = 10_000;

7.2.8.3 Array offsets

In order to make RECNO more compatible with Perl, the array offset for all RECNO arrays begins at 0 rather than 1 as in Berkeley DB.

7.2.8.4 In-memory databases

Berkeley DB allows the creation of in-memory databases by using NULL (that is, a (char *)0 in C) in place of the filename. DB_File uses undef instead of NULL to provide this functionality.

use strict;
use Fcntl;
use DB_File;

my ($k, $v, %hash);

tie(%hash, 'DB_File', undef, O_RDWR|O_CREAT, 0, $DB_BTREE)
    or die "can't tie DB_File: $!":

foreach $k (keys %ENV) {
    $hash{$k} = $ENV{$k};
}

# this will now come out in sorted lexical order 
# without the overhead of sorting the keys
while  (($k,$v) = each %hash) {
    print "$k=$v\n";
}

7.2.8.5 Using the Berkeley DB interface directly

In addition to accessing Berkeley DB using a tied hash or array, you can also make direct use of most functions defined in the Berkeley DB documentation.

To do this you need to remember the return value from tie, or use the tied function to get at it yourself later on.

$db = tie %hash, "DB_File", "filename";

Once you have done that, you can access the Berkeley DB API functions directly.

$db->put($key, $value, R_NOOVERWRITE);  # invoke the DB "put" function

All the functions defined in the dbopen(3) manpage are available except for close() and dbopen() itself. The DB_File interface to these functions mirrors the way Berkeley DB works. In particular, note that all these functions return only a status value. Whenever a Berkeley DB function returns data via one of its parameters, the DB_File equivalent does exactly the same thing.

All the constants defined in the dbopen manpage are also available.

Below is a list of the functions available. (The comments only tell you the differences from the C version.)

get: The $flags parameter is optional. The value associated with the key you request is returned in the $value parameter.
put: As usual the flags parameter is optional. If you use either the R_IAFTER or R_IBEFORE flags, the $key parameter will be set to the record number of the inserted key/value pair.
del: The $flags parameter is optional.
fd: No differences encountered.
seq: The $flags parameter is optional. Both the $key and $value parameters will be set.
sync: The $flags parameter is optional.

7.2.8.6 Examples

Here are a few examples. First, using $DB_HASH:

use DB_File;
use Fcntl;

tie %h,  "DB_File", "hashed", O_RDWR|O_CREAT, 0644, $DB_HASH;

# Add a key/value pair to the file
$h{apple} = "orange";

# Check for value of a key
print "No, we have some bananas.\n" if $h{banana};

# Delete
delete $h{"apple"};
untie %h;

Here is an example using $DB_BTREE. Just to make life more interesting, the default comparison function is not used. Instead, a Perl subroutine, Compare(), does a case-insensitive comparison.

use DB_File;
use Fcntl;

sub Compare {
    my ($key1, $key2) = @_;
    "\L$key1" cmp "\L$key2";
}

$DB_BTREE->{compare} = 'Compare';
tie %h,  'DB_File', "tree", O_RDWR|O_CREAT, 0644, $DB_BTREE;

# Add a key/value pair to the file
$h{Wall}  = 'Larry';
$h{Smith} = 'John';
$h{mouse} = 'mickey';
$h{duck}  = 'donald';

# Delete
delete $h{duck};

# Cycle through the keys printing them in order.
# Note it is not necessary to sort the keys as
# the btree will have kept them in order automatically.
while ($key = each %h) { print "$key\n" }

untie %h;

The preceding code yields this output:

mouse
Smith
Wall

Next, an example using $DB_RECNO. You may access a regular textfile as an array of lines. But the first line of the text file is the zeroth element of the array, and so on. This provides a clean way to seek to a particular line in a text file.

my(@line, $number);
$number = 10;
use Fcntl;
use DB_File;
tie(@line, "DB_File", "/tmp/text", O_RDWR|O_CREAT, 0644, $DB_RECNO)
    or die "can't tie file: $!";
$line[$number - 1] = "this is a new line $number";

Here's an example of updating a file in place:

use Fcntl;
use DB_File;
tie(@file, 'DB_File', "/tmp/sample", O_RDWR, 0644, $DB_RECNO)
    or die "can't update /tmp/sample: $!";
print "line #3 was ", $file[2], "\n";
$file[2] = `date`;
untie @file;

Note that the tied array interface is incomplete, causing some operations on the resulting array to fail in strange ways. See the discussion of tied arrays in Chapter 5. Some object methods are provided to avoid this. Here's an example of reading a file backward:

use DB_File;
use Fcntl;
$H = tie(@h, "DB_File", $file, O_RDWR, 0640, $DB_RECNO)
        or die "Cannot open file $file: $!\n";
# print the records in reverse order
for ($i = $H->length - 1; $i >= 0; --$i) { 
    print "$i: $h[$i]\n";
}
untie @h;

7.2.8.7 Locking databases

Concurrent access of a read-write database by several parties requires that each use some kind of locking. Here's an example that uses the fd() method to get the file descriptor, and then a careful open to give something Perl will flock for you. Run this repeatedly in the background to watch the locks granted in proper order. You have to call the sync() method to ensure that the writes make it to disk between access, or else the library would normally hold some in its own cache.

use Fcntl;
use DB_File;

use strict;

sub LOCK_SH { 1 }
sub LOCK_EX { 2 }
sub LOCK_NB { 4 }
sub LOCK_UN { 8 }

my($oldval, $fd, $db_obj, %db_hash, $value, $key);

$key   = shift || 'default';
$value = shift || 'magic';

$value .= " $$";

$db_obj = tie(%db_hash, 'DB_File', '/tmp/foo.db', O_CREAT|O_RDWR, 0644)
                    or die "dbcreat /tmp/foo.db $!";
$fd = $db_obj->fd;
print "$$: db fd is $fd\n";
open(DB_FH, "+<&=$fd") or die "fdopen $!";

unless (flock (DB_FH, LOCK_SH | LOCK_NB)) {
    print "$$: CONTENTION; can't read during write update!
                Waiting for read lock ($!) ....";
    unless (flock (DB_FH, LOCK_SH)) { die "flock: $!" }
}
print "$$: Read lock granted\n";

$oldval = $db_hash{$key};
print "$$: Old value was $oldval\n";
flock(DB_FH, LOCK_UN);

unless (flock (DB_FH, LOCK_EX | LOCK_NB)) {
    print "$$: CONTENTION; must have exclusive lock!
                Waiting for write lock ($!) ....";
    unless (flock (DB_FH, LOCK_EX)) { die "flock: $!" }
}

print "$$: Write lock granted\n";
$db_hash{$key} = $value;
sleep 10;

$db_obj->sync();                   # to flush
flock(DB_FH, LOCK_UN);
undef $db_obj;                     # removing the last reference to the DB
                                   # closes it. Closing DB_FH is implicit.
untie %db_hash;
print "$$: Updated db to $key=$value\n";

7.2.8.8 See also

Related manpages: dbopen(3), hash(3), recno(3), btree(3).

Berkeley DB is available from these locations:


7.2.7 Cwd - Get Pathname of Current Working Directory		7.2.9 Devel::SelfStubber - Generate Stubs for a SelfLoading Module