In older versions of Perl, a user could call dbmopen to tie a hash to a UNIX DBM file. Whenever the hash was accessed, the database file on disk (really just a hash, not a full relational database) would be magically[16] read from or written to. In modern versions of Perl, you can bind any ordinary variable (scalar, array, or hash) to an implementation class by using tie. (The class may or may not implement a DBM file.) You can break this association with untie.
[16] In this case, magically means "transparently doing something very complicated". You know the old saying - any technology sufficiently advanced is indistinguishable from a Perl script.
The tie function creates the association by creating an object internally to represent the variable to the class. If you have a tied variable, but want to get at the underlying object, there are two ways to do it. First, the tie function returns a reference to the object. But if you didn't bother to store that object reference anywhere, you could still retrieve it using the tied function.
$object = tieVARIABLE
,CLASSNAME
,LIST
untieVARIABLE
$object = tiedVARIABLE
The tie function binds the variable to the class package that
provides the methods for that variable. Once
this magic has been performed, accessing a tied variable automatically
triggers method calls in the proper class. All the complexity of the
class is hidden behind magic method calls. The method names are
predetermined, since they're called implicitly from within the innards
of Perl. These names are in ALL CAPS
, which is a convention in Perl
culture that indicates that the routines are called implicitly rather than
explicitly - just like BEGIN
, END
, and DESTROY
.
And AUTOLOAD
too, for that matter.
You can almost think of tie as a funny kind of bless, except
that it blesses a bare variable instead of a thingy reference, and
takes extra parameters, like a constructor. That's because it actually
does call a constructor internally. (That's one of the magic methods we
mentioned.) This constructor is passed the CLASSNAME
you specified, as
well as any additional arguments you supply in the LIST
. It is not
passed the VARIABLE
, however. The only way the constructor can
tell which kind of VARIABLE
is being tied is by knowing its
own method name. This is not the customary constructor name,
new
, but rather one of TIESCALAR
, TIEARRAY
, or
TIEHASH
. (You can likely figure out which name goes with which
variable type.) The constructor just returns an object reference in the
normal fashion, and doesn't worry about whether it was called from
tie - which it may not have been, since you can call these methods
directly if you like. (Indeed, if you've tied your variable to a class
that provides other methods not accessible through the variable,
you must call the other methods directly yourself, via the object
reference. These extra methods might provide services like file locking
or other forms of transaction protection.)
As in any constructor, these constructors must bless a reference to a thingy and return it as the implementation object. The thingy inside the implementation object doesn't have to be of the same type as the variable you're tying to. It does have to be a properly blessed object, though. See the example below on tied arrays, which uses a hash object to hold information about an array.
The tie function will not use or require a module for you - you must do that explicitly yourself. (On the other hand, the dbmopen emulator function will, for backward compatibility, attempt to use one or another DBM implementation. But you can preempt its selection with an explicit use, provided the module you use is one of the modules in dbmopen's list of modules to try. See the AnyDBM_File module in Chapter 7 for a fuller explanation.)
A class implementing a tied scalar must define the following methods:
TIESCALAR
, FETCH
, STORE
, and possibly
DESTROY
. These routines will be invoked implicitly when you
tie a variable (TIESCALAR
), read a tied variable
(FETCH
), or assign a value to a tied variable (STORE
). The
DESTROY
method is called (as always) when the last reference to
the object disappears. (This may or may not happen when you call
untie, which destroys the reference used by the tie, but doesn't
destroy any outstanding references you may have squirreled away
elsewhere.) The FETCH
and STORE
methods are triggered
when you access the variable that's been tied, not the object it's been
tied to. If you have a handle on the object (either returned by the
initial tie or retrieved later via tied), you can access the
underlying object yourself without automatically triggering its
FETCH
or STORE
methods.
Let's look at each of these methods in turn, using as our example an
imaginary class called Nice
.[17]
Variables tied to this class are scalars containing process priorities,
and each such variable is implicitly associated with an object that
contains a particular process ID, such the ID of the currently running
process or of the parent process. (Presumably you'd name your variables
to remind you which process you're referring to.) Variables are tied to
the class this way:
[17] UNIX priorities are associated with the word "nice" because they're inverted from what you'd expect. Higher priorities run slower, hence are "nicer" to other processes. A more portable module might prefer a less UNIX-centric name like
Priority
. But if we were writing this class for the Perl library, we'd probably call itTie::Priority
or some such, to fit the library's hierarchical naming scheme. Not everything can be a top-level class, or things will get rather confused. Not to mention people.
use Nice; # load the Nice.pm module tie $his_speed, 'Nice', getppid(); tie $my_speed, 'Nice', $$;
Once the variables have been tied, their previous contents are no longer accessible. The internally forged connection between the variable and the object takes precedence over ordinary variable semantics.
For example, let's say you copy a variable that's been tied:
$speed = $his_speed;
Instead of reading the value in the ordinary fashion from the
$his_speed
scalar variable, Perl implicitly calls the
FETCH
method on the associated underlying object. It's as
though you'd written this:
$speed = (tied $his_speed)->FETCH():
Or if you'd captured the object returned by the tie, you could simply use that reference instead of the tied function, as in the following sample code.
$myobj = tie $my_speed, 'Nice', $$; $speed = $my_speed; # through the implicit interface $speed = $myobj->FETCH(); # same thing, explicitly
You can use $myobj
to call methods other than the implicit ones, such
as those provided by the DB_File
class (see Chapter 7).
However, one normally minds one's own business and leaves the underlying
object alone, which is why you often see the return value from tie
ignored. You can still get at it if you need it later.
That's the external view of it. For our implementation, we'll use the
BSD::Resource
class (found in CPAN, but not included with
Perl) to access the PRIO_PROCESS
, PRIO_MIN
, and
PRIO_MAX
constants from your system. Here's the
preamble of our class, which we will put into a file named Nice.pm:
package Nice; use Carp; # Propagates error messages nicely. use BSD::Resource; # Use these hooks into the OS. use strict; # Enforce some discipline on ourselves, use vars '$DEBUG'; # but exempt $DEBUG from discipline.
The Carp module provides methods carp()
, croak()
,
and confess()
, which we'll use in various spots below. As
usual, see Chapter 7 for more about Carp.
The use strict
would ordinarily disallow the use of unqualified
package variables like $DEBUG
, but we then declared the global
with use vars
, so it's exempt. Otherwise we'd have to say
$Nice::DEBUG
everywhere. But it is a global, and other modules
can turn on debugging in our module by setting $Nice::DEBUG
to
some other value before using our module.
TIESCALAR
CLASSNAME
, LIST
The TIESCALAR
method of the class (that is, the
class package, but we're going to stop reminding you of that) is
implicitly invoked whenever tie is
called on a scalar variable. The LIST
contains
any optional parameters needed to properly initialize an object of the
given class. (In our example, there is only one parameter, the
process ID.) The method is expected to return an object, which may or
may not contain an anonymous scalar as its blessed thingy. In our
example, it does.
sub TIESCALAR { my $class = shift; my $pid = shift; $pid ||= $$; # arg of 0 defaults to my process if ($pid =~ /\D/) { carp "Nice::TIESCALAR got non-numeric pid $pid" if $^W; return undef; } unless (kill 0, $pid) { # EPERM or ERSCH, no doubt carp "Nice::TIESCALAR got bad pid $pid: $!" if $^W; return undef; } return bless \$pid, $class; }
Recall that the statement with the ||=
operator is just shorthand for
$pid = $pid || $$; # set if not set
We say the object contains an anonymous scalar, but it doesn't really
become anonymous until my $pid
goes out of scope, since that's the
variable we're generating a reference to when we bestow the blessing.
When returning a reference to an array or hash, one could use the same
approach by employing a lexically scoped array or hash variable, but
usually people just use the anonymous array or hash composers,
[]
and {}
. There is no similar composer for anonymous
scalars.
On the subject of subterfuge, the kill isn't really killing the process. On most UNIX systems, a signal 0 merely checks to see whether the process is there.
This particular tie class has chosen to return an error value rather than raise an exception if its constructor fails. Other classes may not wish to be so forgiving. (In any event, the tie itself will throw an exception when the constructor fails to return an object. But you get more error messages this way, which many folks seem to prefer.) This routine checks the global variable $^W (which reflects Perl's -w flag) to see whether to emit its extra bit of noise.
But for all that, it's an ordinary constructor, and doesn't know it's being called from tie. It just suspects it strongly.
FETCH
THIS
This method is triggered every time the tied variable is accessed
(that is, read). It takes no arguments beyond a reference to the object
that is tied to the variable. (The FETCH
methods for arrays and
hashes do, though.) Since in this case we're just using a scalar
thingy as the tied object, a simple scalar dereference, $$self
,
allows the method to get at the real value stored in its object. In the
example below, that real value is the process ID to which we've tied our
variable.
sub FETCH { my $self = shift; # ref to scalar confess "wrong type" unless ref $self; croak "too many arguments" if @_; my $nicety; local $! = 0; # preserve errno $nicety = getpriority(PRIO_PROCESS, $$self); if ($!) { croak "getpriority failed: $!" } return $nicety; }
This time we've decided to blow up (raise an exception) if the getpriority function fails - there's no place for us to return an error otherwise, and it's probably the right thing to do.
Note the absence of a $
on PRIO_PROCESS
. That's really a
subroutine call into BSD::Resource
that returns the appropriate constant
to feed back into getpriority. The PRIO_PROCESS
declaration was imported by the use declaration. And that's why
there's no $
on the front of it - it's not a variable. (If you
had put a $
, the use strict
would have caught it for you
as an unqualified global.)
STORE
THIS, VALUE
This method is triggered every time the tied variable is set
(assigned). The first argument, THIS
, is again a reference to
the object associated with the variable, and VALUE
is the value the
user is assigning to the variable.
sub STORE { my $self = shift; my $new_nicety = shift; confess "wrong type" unless ref $self; croak "too many arguments" if @_; if ($new_nicety < PRIO_MIN) { carp sprintf "WARNING: priority %d less than minimum system priority %d", $new_nicety, PRIO_MIN if $^W; $new_nicety = PRIO_MIN; } if ($new_nicety > PRIO_MAX) { carp sprintf "WARNING: priority %d greater than maximum system priority %d", $new_nicety, PRIO_MAX if $^W; $new_nicety = PRIO_MAX; } unless (defined setpriority(PRIO_PROCESS, $$self, $new_nicety)) { confess "setpriority failed: $!"; } return $new_nicety; }
There doesn't appear to be anything worth explaining there, except maybe that we return the new value because that's what an assignment returns.
DESTROY
THIS
This method is triggered when the object associated with the
tied variable needs to be destructed (usually only when it goes out of
scope). As with other object classes, such a method is seldom necessary,
since Perl deallocates the moribund object's memory for you automatically.
Here, we'll use a DESTROY
method for debugging purposes only.
sub DESTROY { my $self = shift; confess "wrong type" unless ref $self; carp "[ Nice::DESTROY pid $$self ]" if $DEBUG; }
That's about all there is to it. Actually, it's more than all there
is to it, since we've done a few nice things here for the sake
of completeness, robustness, and general aesthetics (or lack thereof).
Simpler TIESCALAR
classes are certainly possible.
A class implementing a tied ordinary array must define the following
methods: TIEARRAY
, FETCH
, STORE
, and perhaps
DESTROY
.
Tied arrays are incomplete. There are, as yet,
no defined methods to deal with $#ARRAY
access
(which is hard, since it's an lvalue), nor with the other obvious
array functions, like push, pop, shift,
unshift, and splice. This means that a tied array doesn't
behave like an untied one. You can't even determine the length of the
array. But if you use the tied arrays only for simple read and write
access you'll be OK. These restrictions will be removed in a future
release.
For the purpose of this discussion, we will implement an array whose indices are fixed at its creation. If you try to access anything beyond those bounds, you will cause an exception.
require Bounded_Array; tie @ary, 'Bounded_Array', 2; # maximum allowable subscript is 2 $| = 1; for $i (0 .. 10) { print "setting index $i: "; $ary[$i] = 10 * $i; # should raise exception on 3 print "value of element $i now $ary[$i]\n"; }
The preamble code for the class is as follows:
package Bounded_Array; use Carp; use strict;
TIEARRAY
CLASSNAME
, LIST
This is the constructor for the class. That means it is expected to return a blessed reference through which the new array (probably an anonymous array reference) will be accessed.
In our example, just to demonstrate that you don't really have to
use an array thingy, we'll choose a hash thingy to
represent our object. A hash works out well as a generic record type:
the {BOUND}
field will store the maximum bound allowed, and the
{ARRAY}
field will hold the true array reference. Anyone outside
the class who tries to dereference the object returned (doubtless thinking
it an array reference), will blow up. This just goes to show
that you should respect an object's privacy (unless you're well
acquainted and committed to maintaining a good relationship for the
rest of your life).
sub TIEARRAY { my $class = shift; my $bound = shift; confess "usage: tie(\@ary, 'Bounded_Array', max_subscript)" if @_ or $bound =~ /\D/; return bless { BOUND => $bound, ARRAY => [], }, $class; }
In this case we have used the anonymous hash composer rather than a lexically scoped variable that goes out of scope. We also used the array composer within the hash composer.
FETCH
THIS, INDEX
This method will be triggered every time an individual element in the tied array is accessed (read). It takes one argument beyond its self reference: the index we're trying to fetch. (The index is an integer, but just because the caller thinks of it as a mundane integer doesn't mean you have to do anything "linear" with it. You could use it to seed a random number generator, for instance, or process it with a hash function to do a random lookup in a hash table.)
Here we use list assignment rather than shift to process the method arguments. TMTOWTDI.
sub FETCH { my ($self, $idx) = @_; if ($idx > $self->{BOUND}) { confess "Array OOB: $idx > $self->{BOUND}"; } return $self->{ARRAY}[$idx]; }
As you may have noticed, the names of the FETCH
, STORE
, and
DESTROY
methods are
the same for all tied classes, even though the constructors differ in name
(TIESCALAR
versus TIEARRAY
). While in theory you could have the
same class servicing several tied types, in practice this becomes
cumbersome, and it's easiest to simply write them with one type per
class.
STORE
THIS, INDEX, VALUE
This method will be triggered every time an element in the tied array is set (written). It takes two arguments beyond its self reference: the index at which we're trying to store something and the value we're trying to put there. For example:
sub STORE { my ($self, $idx, $value) = @_; if ($idx > $self->{BOUND} ) { confess "Array OOB: $idx > $self->{BOUND}"; } return $self->{ARRAY}[$idx] = $value; }
DESTROY
THIS
This method will be triggered when the tied object needs to be deallocated. As with the scalar tie class, this is almost never needed in a language that does its own storage allocation, so this time we'll just leave it out.
The code we presented at the beginning of this section attempts several out-of-bounds accesses. It will therefore generate the following output:
setting index 0: value of element 0 now 0 setting index 1: value of element 1 now 10 setting index 2: value of element 2 now 20 setting index 3: Array OOB: 3 > 2 at Bounded_Array.pm line 39 Bounded_Array::FETCH called at testba line 12
For historical reasons, hashes have the most complete and useful tie
implementation. A class implementing a tied associative array must
define various methods. TIEHASH
is the constructor. FETCH
and
STORE
access the key/value pairs. EXISTS
reports
whether a key is present in the hash, and DELETE
deletes one.
CLEAR
empties the hash by deleting all the key/value pairs.
FIRSTKEY
and NEXTKEY
implement the keys and
each built-in functions to iterate over all the keys. And
DESTROY
(if defined) is called when the tied object is
deallocated.
If this seems like a lot, then feel free to inherit most of these methods from the standard Tie::Hash module, redefining only the interesting ones. See the Tie::Hash module documentation in Chapter 7 for details.
Remember that Perl distinguishes a key not existing in the hash from a key that exists with an undefined value. The two possibilities can be tested with the exists and defined functions, respectively.
Because functions like keys and values may return huge array values when used on large hashes (like tied DBM files), you may prefer to use the each function to iterate over such. For example:
# print out B-news history file offsets use NDBM_File; tie(%HIST, 'NDBM_File', '/usr/lib/news/history', 1, 0); while (($key,$val) = each %HIST) { print $key, ' = ', unpack('L',$val), "\n"; } untie(%HIST);
(But does anyone run B-news any more?)
Here's an example of a somewhat peculiar tied hash class: it gives you a hash representing a particular user's dotfiles (that is, files whose names begin with a period). You index into the hash with the name of the file (minus the period) and you get back that dotfile's contents. For example:
use DotFiles; tie %dot, "DotFiles"; if ( $dot{profile} =~ /MANPATH/ or $dot{login} =~ /MANPATH/ or $dot{cshrc} =~ /MANPATH/ ) { print "you've set your manpath\n"; }
Here's another way to use our tied class:
# third argument is name of user whose dot files we will tie to tie %him, 'DotFiles', 'daemon'; foreach $f ( keys %him ) { printf "daemon dot file %s is size %d\n", $f, length $him{$f}; }
In our DotFiles example we implement the object as a regular hash
containing several important fields, of which only the {CONTENTS}
field will be what the user thinks of as the real hash. Here are the
fields:
USER
Whose dot files this object represents
HOME
Where those dotfiles live
CLOBBER
Whether we are allowed to change or remove those dot files
CONTENTS
The hash of dotfile names and content mappings
Here's the start of DotFiles.pm:
package DotFiles; use Carp; sub whowasi { (caller(1))[3] . '()' } my $DEBUG = 0; sub debug { $DEBUG = @_ ? shift : 1 }
For our example, we want to be able to emit debugging information to help in
tracing during development. We also keep one convenience function around
internally to help print out warnings; whowasi()
returns the name
of the function that called the current function (whowasi()
's
"grandparent" function).
Here are the methods for the DotFiles tied hash.
TIEHASH
CLASSNAME
, LIST
This is the constructor for the class. That means it is expected to return a blessed reference through which the new object may be accessed. Again, the user of the tied class probably has little need of the object. It's Perl itself that needs the returned object so that it can magically call the right methods when the tied variable is accessed.
Here's the constructor:
sub TIEHASH { my $self = shift; my $user = shift || $>; my $dotdir = shift || ""; croak "usage: @{[&whowasi]} [USER [DOTDIR]]" if @_; $user = getpwuid($user) if $user =~ /^\d+$/; my $dir = (getpwnam($user))[7] or croak "@{[&whowasi]}: no user $user"; $dir .= "/$dotdir" if $dotdir; my $node = { USER => $user, HOME => $dir, CONTENTS => {}, CLOBBER => 0, }; opendir DIR, $dir or croak "@{[&whowasi]}: can't opendir $dir: $!"; foreach $dot ( grep /^\./ && -f "$dir/$_", readdir(DIR)) { $dot =~ s/^\.//; $node->{CONTENTS}{$dot} = undef; } closedir DIR; return bless $node, $self; }
It's probably worth mentioning that if you're going to filetest the return values returned by that readdir, you'd better prepend the directory in question (as we do). Otherwise, since no chdir was done, you'd test the wrong file.
FETCH
THIS, KEY
This method will be triggered every time an element in the tied hash is accessed (read). It takes one argument beyond its self reference: the key whose value we're trying to fetch. The key is a string, and you can do anything you like with it (consistent with its being a string).
Here's the fetch for our DotFiles example.
sub FETCH { carp &whowasi if $DEBUG; my $self = shift; my $dot = shift; my $dir = $self->{HOME}; my $file = "$dir/.$dot"; unless (exists $self->{CONTENTS}->{$dot} || -f $file) { carp "@{[&whowasi]}: no $dot file" if $DEBUG; return undef; } # Implement a cache. if (defined $self->{CONTENTS}->{$dot}) { return $self->{CONTENTS}->{$dot}; } else { return $self->{CONTENTS}->{$dot} = `cat $dir/.$dot`; } }
This function was easy to write by having it call the UNIX cat(1) command, but it would be more portable (and somewhat more efficient) to open the file ourselves. On the other hand, since dot files are a UNIXy concept, we're not that concerned.
STORE
THIS, KEY, VALUE
This method will be triggered every time an element in the tied hash is set (written). It takes two arguments beyond its self reference: the key under which we're storing the value and the value we're putting there.
Here in our DotFiles example we won't let users overwrite a file without
first calling the clobber()
method on the original object reference returned by tie.
sub STORE { carp &whowasi if $DEBUG; my $self = shift; my $dot = shift; my $value = shift; my $file = $self->{HOME} . "/.$dot"; croak "@{[&whowasi]}: $file not clobberable" unless $self->{CLOBBER}; open(F, "> $file") or croak "can't open $file: $!"; print F $value; close(F); }
If they want to clobber something, they can say:
$ob = tie %daemon_dots, 'daemon'; $ob->clobber(1); $daemon_dots{signature} = "A true daemon\n";
But there's also the tied function, so they could alternatively
set clobber
using:
tie %daemon_dots, 'daemon'; tied(%daemon_dots)->clobber(1);
The clobber
method is simply:
sub clobber { my $self = shift; $self->{CLOBBER} = @_ ? shift : 1; }
DELETE
THIS, KEY
This method is triggered when we remove an element from the hash, typically by using the delete function. Again, we'll be careful to check whether the user really wants to clobber files.
sub DELETE { carp &whowasi if $DEBUG; my $self = shift; my $dot = shift; my $file = $self->{HOME} . "/.$dot"; croak "@{[&whowasi]}: won't remove file $file" unless $self->{CLOBBER}; delete $self->{CONTENTS}->{$dot}; unlink $file or carp "@{[&whowasi]}: can't unlink $file: $!"; }
CLEAR
THIS
This method is triggered when the whole hash is to be cleared, usually by assigning the empty list to it.
In our example, that would remove all the user's dotfiles! It's such a
dangerous thing that we'll require CLOBBER
to be set higher than 1
before this can happen.
sub CLEAR { carp &whowasi if $DEBUG; my $self = shift; croak "@{[&whowasi]}: won't remove all dotfiles for $self->{USER}" unless $self->{CLOBBER} > 1; my $dot; foreach $dot ( keys %{$self->{CONTENTS}}) { $self->DELETE($dot); } }
EXISTS
THIS, KEY
This method is triggered when the user invokes the exists function
on a particular hash. In our example, we'll look at the {CONTENTS}
hash element to find the answer:
sub EXISTS { carp &whowasi if $DEBUG; my $self = shift; my $dot = shift; return exists $self->{CONTENTS}->{$dot}; }
FIRSTKEY
THIS
This method is triggered when the user begins to iterate through the hash, such as with a keys or each call. By calling keys in a scalar context, we reset its internal state to ensure that the next each used in the return statement will get the first key.
sub FIRSTKEY { carp &whowasi if $DEBUG; my $self = shift; my $a = keys %{$self->{CONTENTS}}; return scalar each %{$self->{CONTENTS}}; }
NEXTKEY
THIS, LASTKEY
This method is triggered during a keys
or each iteration. It has a second
argument which is the last key that has been accessed. This is useful
if the NEXTKEY
method needs to know its previous
state to calculate the next state.
For our example, we are using a real hash to represent the tied
hash's data, except that this hash is stored in the hash's
CONTENTS
field instead of in the hash itself. So we can just
rely on Perl's each iterator:
sub NEXTKEY { carp &whowasi if $DEBUG; my $self = shift; return scalar each %{ $self->{CONTENTS} } }
DESTROY
THIS
This method is triggered when a tied hash's object is about to be deallocated. You don't really need it except for debugging and extra cleanup. Here's a very simple function:
sub DESTROY { carp &whowasi if $DEBUG; }