11.2 DBM ModulesA DBM-like file is a file that contains a set of pairs of strings (key,data), with support for fetching or storing the data given a key, known as keyed access. DBM-like files were originally supported on early Unix systems, with functionality roughly equivalent to that of access methods popular on other mainframe and minicomputers of the time, such as ISAM, the Indexed-Sequential Access Method. Today, several different libraries, available for many platforms, let programs written in many different languages create, update, and read DBM-like files. Keyed access, while not as powerful as the data access functionality of relational databases, may often suffice for a program's needs. And if DBM-like files are sufficient, you may end up with a program that is smaller, faster, and more portable than one that uses an RDBMS. The classic dbm library, whose first version introduced DBM-like files many years ago, has limited functionality, but tends to be available on most Unix platforms. The GNU version, gdbm, is richer and also widespread. The BSD version, dbhash, offers superior functionality. Python supplies modules that interface with each of these libraries if the relevant underlying library is installed on your system. Python also offers a minimal DBM module, dumbdbm (usable anywhere, as it does not rely on other installed libraries), and generic DBM modules, which are able to automatically identify, select, and wrap the appropriate DBM library to deal with an existing or new DBM file. Depending on your platform, your Python distribution, and what dbm-like libraries you have installed on your computer, the default Python build may install some subset of these modules. In general, at a minimum, you can rely on having module dbm on Unix-like platforms, module dbhash on Windows, and dumbdbm on any platform. 11.2.1 The anydbm ModuleThe anydbm module is a generic interface to any other DBM module. anydbm supplies a single factory function.
Opens or creates the DBM file named by filename (a string that can denote any path to a file, not just a name), and returns a suitable mapping object corresponding to the DBM file. When the DBM file already exists, open uses module whichdb to determine which DBM library can handle the file. When open creates a new DBM file, open chooses the first available DBM module in order of preference: dbhash, gdbm, dbm, and dumbdbm. flag is a one-character string that tells open how to open the file and whether to create it, as shown in Table 11-1. mode is an integer that open uses as the file's permission bits if open creates the file, as covered in Section 10.2.2 in Chapter 10. Not all DBM modules use flags and mode, but for portability's sake you should always supply appropriate values for these arguments when you call anydbm.open.
anydbm.open returns a mapping object m that supplies a subset of the functionality of dictionaries (covered in Chapter 4). m only accepts strings as keys and values, and the only mapping methods m supplies are m.has_key and m.keys. However, you can bind, rebind, access, and unbind items in m with the same indexing syntax m[key] that you would use if m were a dictionary. If flag is 'r', open returns a mapping m that is read-only so that you can only access m's items, not bind, rebind, or unbind them. One extra method that m supplies is m.close, with the same semantics as the close method of a built-in file object. You should ensure m.close( ) is called when you're done using m. The try/finally statement (covered in Chapter 6) is the best way to ensure finalization. 11.2.2 The dumbdbm ModuleThe dumbdbm module supplies minimal DBM functionality and mediocre performance. dumbdbm's only advantage is that you can use it anywhere, since dumbdbm does not rely on any library. You don't normally import dumbdbm; rather, import anydbm, and let anydbm supply your program with the best DBM module available, defaulting to dumbdbm if nothing better is available on the current Python installation. The only case in which you import dumbdbm directly is the rare one in which you need to create a DBM-like file that you can later read from any Python installation. Module dumbdbm supplies an open function and an exception class error that are polymorphic to those anydbm supplies. 11.2.3 The dbm, gdbm, and dbhash ModulesThe dbm module exists only on Unix platforms, where it can wrap any of the dbm, ndbm, and gdbm libraries, since each supplies a dbm-compatibility interface. You never import dbm directly; rather, you import anydbm, and let anydbm supply your program with the best DBM module available, defaulting to dbm if appropriate. Module dbm supplies an open function and an exception class error that are polymorphic to those anydbm supplies. The gdbm module wraps the GNU DBM library, gdbm. The gdbm.open function accepts other values for the flag argument, and returns a mapping object m supplying a few extra methods. You may need to import gdbm directly, if you need to access non-portable functionality. I do not cover gdbm specifics in this book, since the book is focused on cross-platform Python. The dbhash module wraps the BSD DBM library in a DBM-compatible way. The dbhash.open function accepts other values for the flag argument, and returns a mapping object m supplying a few extra methods. You may choose to import dbhash directly, if you need to access non-portable functionality. For full access to the BSD DB functionality, however, you can also import bsddb, covered in Section 11.3 later in this chapter. 11.2.4 The whichdb ModuleThe whichdb module attempts to guess which of the several DBM modules are available. whichdb supplies a single function.
Opens the file specified by filename and determines which DBM-like package created the file. whichdb returns None if the file does not exist or cannot be opened and read. whichdb returns '' if the file exists and can be opened and read, but it cannot be determined which DBM-like package created the file (i.e., the file is not a DBM file). whichdb returns a string naming a module, such as 'dbm', 'dumbdbm', or 'dbhash', if it can determine which module can read the DBM-like file named by filename. 11.2.5 Examples of DBM-Like File UseKeyed access is quite suitable when your program needs to record, in a persistent way, the equivalent of a Python dictionary, with strings as both keys and values. For example, suppose you need to analyze several text files, whose names are given as your program's arguments, and record where each word appears in those files. In this case, the keys are words, and, therefore, intrinsically strings. The data you need to record for each word is a list of (filename, line-number) pairs. However, you can encode the data as a string in several ways, for example by exploiting the fact that the path separator string os.pathsep (covered in Chapter 10) does not normally appear in filenames. (Note that more solid, general, and reliable approaches to the general issue of encoding data as strings are covered in Section 11.1 earlier in this chapter.) With this simplification, the program that records word positions in files might be as follows: import fileinput, os, anydbm wordPos = { } sep = os.pathsep for line in fileinput.input( ): pos = '%s%s%s'%(fileinput.filename( ), sep, fileinput.filelineno( )) for word in line.split( ): wordPos.setdefault(word,[ ]).append(pos) dbmOut = anydbm.open('indexfile','n') sep2 = sep * 2 for word in wordPos: dbmOut[word] = sep2.join(wordPos[word]) dbmOut.close( ) We can read back the data stored to the DBM-like file indexfile in several ways. The following example accepts words as command-line arguments and prints the lines where the requested words appear: import sys, os, anydbm, linecache dbmIn = anydbm.open('indexfile') sep = os.pathsep sep2 = sep * 2 for word in sys.argv[1:]: if not dbmIn.has_key(word): sys.stderr.write('Word %r not found in index file\n' % word) continue places = dbmIn[word].split(sep2) for place in places: fname, lineno = place.split(sep) print "Word %r occurs in line %s of file %s:" % (word,lineno,fname) print linecache.getline(fname, int(lineno)), |