First of all, you need to understand packages and modules as previously described in this chapter. You also need to know what references and referenced thingies are in Perl; see Chapter 4, References and Nested Data Structures, for that.
It's also helpful to understand a little about object-oriented programming (OOP), so in the next section we'll give you a little course on OOL (object-oriented lingo).
An object is a data structure with a collection of behaviors. We generally speak of behaviors as being performed by the object directly, sometimes to the point of anthropomorphizing the object. For example, we might say that a rectangle "knows" how to display itself on the screen, or "knows" how to compute its own area.
An object gets its behaviors by being an instance of a class. The class defines methods that apply to all objects belonging to that class, called instance methods.
The class will also likely include instance-independent methods, called class methods.[9] Some class methods create new objects of the classes, and are called constructor methods (such as "create a new rectangle with width 10 and height 5"). Other class methods might perform operations on many objects collectively ("display all rectangles"), or provide other necessary operations ("read a rectangle from this file").
[9] Or sometimes static methods.
A class may be defined so as to inherit both class and instance methods from parent classes, also known as base classes. This allows a new class to be created that is similar to an existing class, but with added behaviors. Any method invocation that is not found in a particular class will be searched for in the parent classes automatically. For example, a rectangle class might inherit some common behaviors from a generic polygon class.
While you might know the particular implementation of an object, generally you should treat the object as a black box. All access to the object should be obtained through the published interface via the provided methods. This allows the implementation to be revised, as long as the interface remains frozen (or at least, upward compatible). By published interface, we mean the written documentation describing how to use a particular class. (Perl does not have an explicit interface facility apart from this. You are expected to exercise common sense and common decency.)
Objects of different classes may be held in the same variable at different
times. When a method is invoked on the contents of the variable, the
proper method for the object's class gets selected automatically. If, for
example, the draw()
method is invoked on a variable that holds
either a rectangle or a circle, the method actually used depends on the
current nature of the object to which the variable refers. For this to
work, however, the methods for drawing circles and rectangles must both be
called draw()
.
Admittedly, there's a lot more to objects than this, and a lot of ways to find out more. But that's not our purpose here. So, on we go.
Here are three simple definitions that you may find reassuring:
An object is simply a referenced thingy that happens to know which class it belongs to.
A class is simply a package that happens to provide methods to deal with objects.
A method is simply a subroutine that expects an object reference (or a package name, for class methods) as its first argument.
We'll cover these points in more depth now.
Perl doesn't provide any special syntax for constructors. A constructor is merely a subroutine that returns a reference to a thingy that it has blessed into a class, generally the class in which the subroutine is defined. The constructor does this using the built-in bless function, which marks a thingy as belonging to a particular class. It takes either one or two arguments: the first argument is a regular hard reference to any kind of thingy, and the second argument (if present) is the package that will own the thingy. If no second argument is supplied, the current package is assumed. Here is a typical constructor:
package Critter; sub new { return bless {}; }
The {}
composes a reference to an empty anonymous hash.
The bless function takes that hash reference and tells the thingy
it references that it's now a member of the class Critter, and returns the
reference.
The same thing can be accomplished more explicitly this way:
sub new { my $obref = {}; # ref to empty hash bless $obref; # make it an object in this class return $obref; # return it }
Once a reference has been blessed into a class, you can invoke the class's instance methods upon it. For example:
$circle->draw();
We'll discuss method invocation in more detail below.
Sometimes constructors call other methods in the class as part of the
construction. Here we'll call an _initialize()
method, which
may be in the current package or in one of the classes (packages) that
this class inherits from. The leading underscore is an oft-used convention
indicating that the function is private, that is, to be used
only by the class itself. This result can also be achieved by omitting
the function from the published documentation for that class.
sub new { my $self = {} bless $self; $self->_initialize(); return $self; }
If you want your constructor method to be (usefully) inheritable, then
you must use the two-argument form of bless. That's because, in
Perl, methods execute in the context of the original base class rather
than in the context of the derived class. For example, suppose you have
a Polygon class that had a new()
method as a constructor. This
would work fine when called as Polygon->new()
. But then you
decide to also have a Square class, which inherits methods from the
Polygon class. The only way for that constructor to build an object of
the proper class when it is called as Square->new()
is by using
the two-argument form of bless, as in the following example:
sub new { my $class = shift; my $self = {}; bless $self, $class; # bless $self into the designated class $self->_initialize(); # in case there's more work to do return $self; }
Within the class package, methods will typically deal with the reference as an ordinary (unblessed) reference to a thingy. Outside the class package, the reference should generally be treated as an opaque value that may only be accessed through the class's methods. (Mutually consenting classes may of course do whatever they like with each other, but even that doesn't necessarily make it right.)
A constructor may re-bless a referenced object currently belonging to another class, but then the new class is responsible for all cleanup later. The previous blessing is forgotten, as an object may only belong to one class at a time. (Although of course it's free to inherit methods from many classes.)
A clarification: Perl objects are blessed. References are not. Thingies know which package they belong to. References do not. The bless operator simply uses the reference in order to find the thingy. Consider the following example:
$a = {}; # generate reference to hash $b = $a; # reference assignment (shallow) bless $b, Mountain; bless $a, Fourteener; print "\$b is a ", ref($b), "\n";
This reports $b
as being a member of class Fourteener
, not a
member of class Mountain
, because the second blessing operates on the
underlying thingy that $a
refers to, not on the reference itself.
Thus is the first blessing forgotten.
Perl doesn't provide any special syntax for class definitions. You just use a package as a class by putting method definitions into the class.
Within each package a special array called @ISA
tells Perl where else to look for a method if it can't find the method in
that package. This is how Perl implements inheritance. Each
element of the @ISA
array is just the name of another package
that happens to be used as a class. The packages are recursively
searched (depth first) for missing methods, in the order that packages
are mentioned in @ISA
. This means that if you have two
different packages (say, Mom
and Dad
) in a class's
@ISA
, Perl would first look for missing methods in
Mom
and all of her ancestor classes before going on to search through
Dad
and his ancestors. Classes accessible through @ISA
are known as base classes of the current class, which is itself called
the derived class.[10]
[10] Instead of "base class" and "derived class", some OOP literature uses superclass for the more generic classes and subclass for the more specific ones. Confusing the issue further, some literature uses "base class" to mean a "most super" superclass. That's not what we mean by it.
If a missing method is found in one of the base classes, Perl internally
caches that location in the current class for efficiency, so the next time
it has to find the method, it doesn't have to look so far. Changing
@ISA
or defining new subroutines invalidates this cache and causes
Perl to do the lookup again.
If a method isn't found but an AUTOLOAD
routine is found, then
that routine is called on behalf of the missing method, with
that package's $AUTOLOAD
variable set to the
fully qualified method name.
If neither a method nor an AUTOLOAD
routine is found in
@ISA
, then one last, desperate try is made for the method (or an
AUTOLOAD
routine) in the special pre-defined class called
UNIVERSAL
.
This package does not initially contain any definitions (although see CPAN for
some), but you may place your "last-ditch" methods there. Think of it as a global base
class from which all other classes implicitly derive.
If that method still doesn't work, Perl finally gives up and complains by raising an exception.
Perl classes do only method inheritance. Data inheritance is left
up to the class itself. By and large, this is not a problem in Perl,
because most classes model the attributes of their object using
an anonymous hash. All the object's data fields (termed "instance
variables" in some languages) are contained within this anonymous hash instead of
being part of the language itself. This hash serves as its own little
namespace to be carved up by the various classes that might want to do
something with the object. For example, if you want an object called
$user_info
to have a data field named age
, you can simply
access $user_info->{age}
. No declarations are necessary. See the
section on "Instance Variables" under "Some Hints About Object Design"
later in this chapter.
Perl doesn't provide any special syntax for method definition. (It does provide a little syntax for method invocation, though. More on that later.) A method expects its first argument to indicate the object or package it is being invoked on.
A class method expects a class (package) name as its first argument. (The class name isn't blessed; it's just a string.) These methods provide functionality for the class as a whole, not for any individual object instance belonging to the class. Constructors are typically written as class methods. Many class methods simply ignore their first argument, since they already know what package they're in, and don't care what package they were invoked via. (These aren't necessarily the same, since class methods follow the inheritance tree just like ordinary instance methods.)
Another typical use for class methods might be to look up an object by some nickname in a global registry:
sub find { my ($class, $nickname) = @_; return $objtable{$nickname}; }
An instance method expects an object reference[11]
as its first argument.
Typically it shifts the first argument into a private variable (often
called $self
or $this
depending on the cultural biases of the
programmer), and then it uses the variable as an ordinary reference:
[11] By which we mean simply an ordinary hard reference that happens to point to an object thingy. Remember that the reference itself doesn't know or care whether its thingy is blessed.
sub display { my $self = shift; my @keys; if (@_ == 0) { # no further arguments @keys = sort keys(%$self); } else { @keys = @_; # use the ones given } foreach $key (@keys) { print "\t$key => $self->{$key}\n"; } }
Despite being counterintuitive to object-oriented novices, it's a good idea not to check the type of object that caused the instance method to be invoked. If you do, it can get in the way of inheritance.
Because there is no language-defined distinction between definitions of class methods and instance methods (nor arbitrary functions, for that matter), you could actually have the same method work for both purposes. It just has to check whether it was passed a reference or not. Suppose you want a constructor that can figure out its class from either a classname or an existing object. Here's an example of the two uses of such a method:
$ob1 = StarKnight->new(); $luke = $ob1->new();
Here's how such a method might be defined. We use the ref function
to find out the type of the object the method was called on so our new
object can be blessed into that class. If ref returns false, then
our $self
argument isn't an object, so it must be a class name.
package StarKnight; sub new { my $self = shift; my $type = ref($self) || $self; return bless {}, $type; }
Perl supports two different syntactic forms for explicitly invoking class or instance methods.[12] Unlike normal function calls, method calls always receive, as their first parameter, the appropriate class name or object reference upon which they were invoked.
[12] Methods may also be called implicitly due to object destructors, tied variables, or operator overloading. Properly speaking, none of these is a function invocation. Rather, Perl uses the information presented via the syntax to determine which function to call. Operator overloading is implemented by the standard overload module as described separately in Chapter 7.
The first syntax form looks like this:
METHOD
CLASS_OR_INSTANCE
LIST
Since this is similar to using the filehandle specification with
print or printf,
and also similar to English sentences like "Give
the dog the bone," we'll call it the indirect object form. To look
up an object with the class method find
, and to print out
some of its attributes with the instance method display
, you could
say this:
$fred = find Critter "Fred"; display $fred 'Height', 'Weight';
The indirect object form allows a BLOCK
returning an object (or class)
in the indirect object slot, so you can combine these into one
statement:
display { find Critter "Fred" } 'Height', 'Weight';
The second syntax form looks like this:
CLASS_OR_INSTANCE
->METHOD
(LIST
)
This second syntax employs the ->
notation. It is sometimes called the object-oriented syntax.
The parentheses are required if there are any arguments, because this form can't
be used as a list operator, although the first form can.
$fred = Critter->find("Fred"); $fred->display('Height', 'Weight');
Or, you can put the above in only one statement, like this:
Critter->find("Fred")->display('Height', 'Weight');
There are times when one syntax is more readable, and times when the other syntax is more readable. The indirect object syntax is less cluttered, but it has the same ambiguity as ordinary list operators. If there is an open parenthesis following the class or object, then the matching close parenthesis terminates the list of arguments. Thus, the parentheses of
new Critter ('Barney', 1.5, 70);
are assumed to surround all the arguments of the method call, regardless of what comes afterward. Therefore, saying
new Critter ('Bam' x 2), 1.4, 45;
would be equivalent to
Critter->new('Bam' x 2), 1.4, 45;
which is unlikely to do what you want since the 1.4
and 45
are not being passed to the new()
routine.
There may be occasions when you need to specify which class's method to use. In that case, you could call your method as an ordinary subroutine call, being sure to pass the requisite first argument explicitly:
$fred = MyCritter::find("Critter", "Fred"); MyCritter::display($fred, 'Height', 'Weight');
However, this does not do any inheritance. If you merely want to specify that Perl should start looking for a method in a particular package, use an ordinary method call, but qualify the method name with the package like this:
$fred = Critter->MyCritter::find("Fred"); $fred->MyCritter::display('Height', 'Weight');
If you're trying to control where the method search begins and you're
executing in the class package itself, then you may use the SUPER
pseudoclass, which says to start looking in your base class's
@ISA
list without having to explicitly name it:
$self->SUPER::display('Height', 'Weight');
The SUPER
construct is meaningful only when used inside the class methods; while
writers of class modules can employ SUPER
in their own code,
people who merely use class objects cannot.
Sometimes you want to call a method when you don't know the method name ahead of time. You can use the arrow form, replacing the method name with a simple scalar variable (not an expression or indexed aggregate) containing the method name:
$method = $fast ? "findfirst" : "findbest"; $fred->$method(@args);
We mentioned that the object-oriented notation is less syntactically
ambiguous than the indirect object notation, even though the latter
is less cluttered. Here's why:
An indirect object is limited to a name, a scalar variable, or a
BLOCK
.[13]
(If you try to put anything more complicated in that slot, it will not
be parsed as you expect.) The left side of ->
is not so limited.
This means that A and B below are equivalent to each other, and C and D
are also equivalent, but A and B differ from C and D:
[13] Attentive readers will recall that this is precisely the same list of syntactic items that are allowed after a funny character to indicate a variable dereference - for example,
@ary
,@$aryref
, or@{$aryref}
.
A: method $obref->{fieldname} B: (method $obref)->{fieldname} C: $obref->{fieldname}->method() D: method {$obref->{fieldname}}
In A and B, the method applies to $obref
, which must yield a hash
reference with "fieldname"
as a key. In C and D the method
applies to $obref->{fieldname}
, which must evaluate to an object
appropriate for the method.
When the last reference to an object goes away, the object is
automatically destroyed. (This may even be after you exit, if you've
stored references in global variables.) If you want to capture control
just before the object is freed, you may define a DESTROY
method in
your class. It will automatically be called at the appropriate moment,
and you can do any extra cleanup you desire. (Perl does the memory
management cleanup for you automatically.)
Perl does not do nested destruction for you. If your constructor
re-blessed a reference from one of your base classes, your DESTROY
method may need to call DESTROY
for any base classes that need it.
But this only applies to re-blessed objects; an object reference that is
merely contained within the current object - as, for example, one
value in a larger hash - will be freed and
destroyed automatically. This is one of the reasons why containership
via mere aggregation (sometimes called a "has-a" relationship) is often
cleaner and clearer than inheritance (an "is-a" relationship). In other
words, often you really only need to store one object inside another
directly instead of employing inheritance, which can add unnecessary
complexity.
After Perl has vainly looked through an object's class package and the packages of
its base classes to find a method, it also checks for an AUTOLOAD
routine in each package before concluding that the method can't be found.
One could use this property to provide an interface to the object's
data fields (instance variables) without writing a separate function
for each. Consider the following code:
use Person; $him = new Person; $him->name("Jason"); $him->age(23); $him->peers( ["Norbert", "Rhys", "Phineas"] ); printf "%s is %d years old.\n", $him->name, $him->age; print "His peers are: ", join(", ", @{$him->peers}), ".\n";
The Person class implements a data structure containing three fields:
name
, age
, and peers
. Instead of accessing the objects' data fields directly, you use
supplied methods to do so. To set one of these fields, call a method of
that name with an argument of the value the field should be set to. To
retrieve one of the fields without setting it, call the method without an
argument. Here's the code that does that:
package Person; use Carp; # see Carp.pm in Chapter 7 my %fields = ( name => undef, age => undef, peers => undef, ); sub new { my $that = shift; my $class = ref($that) || $that; my $self = { %fields, }; bless $self, $class; return $self; } sub AUTOLOAD { my $self = shift; my $type = ref($self) || croak "$self is not an object"; my $name = $AUTOLOAD; $name =~ s/.*://; # strip fully-qualified portion unless (exists $self->{$name} ) { croak "Can't access `$name' field in object of class $type"; } if (@_) { return $self->{$name} = shift; } else { return $self->{$name}; } }
As you see, there isn't really a method named name()
,
age()
, or peers()
to be found anywhere. The
AUTOLOAD
routine takes care of all of these. This class is a
fairly generic implementation of something analogous to a C structure.
A more complete implementation of this notion can be found in the
Class::Template module contained on CPAN. The Alias
module found there may also prove useful for simplifying member access.[14]
[14] CPAN is the Comprehensive Perl Archive Network, as described in the Preface.
High-level languages typically allow the programmers to dispense with worrying about deallocating memory when they're done using it. This automatic reclamation process is known as garbage collection. For most purposes, Perl uses a fast and simple, reference-based garbage collection system. One serious concern is that unreachable memory with a non-zero reference count will normally not get freed. Therefore, saying this is a bad idea:
{ # make $a and $b point to each other my($a, $b); $a = \$b; $b = \$a; }
or more simply:
{ # make $a point to itself my $a; $a = \$a; }
When a block is exited, its my variables are normally freed up. But their internal reference counts can never go to zero, because the variables point at each other or themselves. This is circular reference. No one outside the block can reach them, which makes them useless. But even though they should go away, they can't. When building recursive data structures, you'll have to break the self-reference yourself explicitly if you don't care to cause a memory leak.
For example, here's a self-referential node such as one might use in a sophisticated tree structure:
sub new_node { my $self = shift; my $class = ref($self) || $self; my $node = {}; $node->{LEFT} = $node->{RIGHT} = $node; $node->{DATA} = [ @_ ]; return bless $node, $class; }
If you create nodes like this, they (currently)[15] won't ever go away unless you break the circular references yourself.
[15] In other words, this behavior is not to be construed as a feature, and you shouldn't depend on it. Someday, Perl may have a full mark-and-sweep style garbage collection as in Lisp or Scheme. If that happens, it will properly clean up memory lost to unreachable circular data.
Well, almost never.
When an interpreter thread finally shuts down (usually when your program exits), then a complete pass of garbage collection is performed, and everything allocated by that thread gets destroyed. This is essential to support Perl as an embedded or a multithreadable language. When a thread shuts down, all its objects must be properly destructed, and all its memory has to be reclaimed. The following program demonstrates Perl's multi-phased garbage collection:
#!/usr/bin/perl package Subtle; sub new { my $test; $test = \$test; # Create a self-reference. warn "CREATING " . \$test; return bless \$test; } sub DESTROY { my $self = shift; warn "DESTROYING $self"; } package main; warn "starting program"; { my $a = Subtle->new; my $b = Subtle->new; $$a = 0; # Break this self-reference, but not the other. warn "leaving block"; } warn "just exited block"; warn "time to die..."; exit;
When run as /tmp/try, the following output is produced:
starting program at /tmp/try line 18. CREATING SCALAR(0x8e5b8) at /tmp/try line 7. CREATING SCALAR(0x8e57c) at /tmp/try line 7. leaving block at /tmp/try line 23. DESTROYING Subtle=SCALAR(0x8e5b8) at /tmp/try line 13. just exited block at /tmp/try line 26. time to die... at /tmp/try line 27. DESTROYING Subtle=SCALAR(0x8e57c) during global destruction.
Notice that "global destruction" in the last line? That's the thread garbage collector reaching the unreachable.
Objects are always destructed even when regular references aren't, and in fact
are destructed in a separate pass before ordinary references. This is an
attempt to prevent object destructors from using references that have
themselves been destructed. Plain references are (currently) only garbage
collected if the "destruct level" is greater than 0, which is usually only
true when Perl is invoked as an embedded interpreter. You can test the
higher levels of global destruction in the regular Perl executable by
setting the PERL_DESTRUCT_LEVEL
environment variable (presuming the
-DDEBUGGING
option was enabled at Perl build time).