[Chapter 4] 4.7 Data Structure Code Examples

4.7 Data Structure Code Examples

Once you've mastered creating and using multi-dimensional arrays (lists of lists), you'll want to be able to make more complex data structures. If you're looking for C structures or Pascal records, you won't find any special reserved words in Perl to set these up for you. What you get instead is a more flexible system.[8] Perl has just two ways of organizing data: either as ordered lists stored in arrays and accessed by position, or as unordered key/value pairs stored in hashes and accessed by name.

[8] If your idea of a record structure is less flexible than this, or if you'd like to provide your users with something more opaque and rigid, then you can use the object-oriented features detailed in Chapter 5, Packages, Modules, and Object Classes.

The best way to represent a record in Perl is using a hash reference, but how you choose to organize such records will vary. You may wish to keep an ordered list of these records around that you can look up by number, in which case you'd use an array to store the records (hash references). But you might wish to look up the records by name, in which case you'd store them in another hash. You could even do both at once: the array and the hash could each hold references to the same records, which are after all just anonymous hash thingies, and each one can have as many references to it as you want, within reason.[9]

[9] Where reason is defined as 2**32 references, minus one. That's probably sufficient for most folks.

In the following sections you will find code examples detailing how to compose, generate, access, and print out each of five data structures. The first four examples are straightforward homogeneous combinations of arrays and hashes, while the last one demonstrates how to use a less regular data structure. These examples, presented with little comment, assume that you have already familiarized yourself with the earlier explanations set forth in this chapter.

Arrays of Arrays

Use an array of arrays when you want a basic two-dimensional matrix. One application might include making a list of all the hosts on your network, but each of these hosts would have several possible aliases. Another might be a list of daily menus, each of which would itself be a list of foods served in it. For our example, we'll keep several lists of famous television characters, all stored together in one large list of lists.

Composition of an array of arrays

@LoL = (
    [ "fred", "barney" ],
    [ "george", "jane", "elroy" ],
    [ "homer", "marge", "bart" ],
);

Generation of an array of arrays

# reading from a file
while ( <> ) {
    push @LoL, [ split ];
}
# calling a function
for $i ( 1 .. 10 ) {
    $LoL[$i] = [ somefunc($i) ];
}
# using temp vars
for $i ( 1 .. 10 ) {
    @tmp = somefunc($i);
    $LoL[$i] = [ @tmp ];
}
# add to an existing row
push @{ $LoL[0] }, "wilma", "betty";

Access and printing of an array of arrays

# one element
$LoL[0][0] = "Fred";
# another element
$LoL[1][1] =~ s/(\w)/\u$1/;
# print the whole thing with refs
for $array_ref ( @LoL ) {
    print "\t [ @$array_ref ],\n";
}
# print the whole thing with indices
for $i ( 0 .. $#LoL ) {
    print "\t [ @{$LoL[$i]} ],\n";
}
# print the whole thing one at a time
for $i ( 0 .. $#LoL ) {
    for $j ( 0 .. $#{$LoL[$i]} ) {
        print "element $i $j is $LoL[$i][$j]\n";
    }
}

Hashes of Arrays

Use a hash of arrays when you want to look up each array by a particular string rather than merely by an index number. In our example of television characters, rather than merely looking up the list of names by the zeroth show, the first show, and so on, we'll set it up so we can look up the cast list according to the name of the show.

Because our outer data structure is a hash, we've lost all ordering of its contents. That means when you print it out, you can't predict the order things will come out. You can call the sort function and print its result if you'd like a particular output order.

Composition of a hash of arrays

# we customarily omit quotes when keys are identifiers
%HoL = (
    flintstones    => [ "fred", "barney" ],
    jetsons        => [ "george", "jane", "elroy" ],
    simpsons       => [ "homer", "marge", "bart" ],
);

Generation of a hash of arrays

# reading from file with the following format:
# flintstones: fred barney wilma dino
while ( <> ) {
    next unless s/^(.*?):\s*//;
    $HoL{$1} = [ split ];
}
# reading from file; more temporary variables
# flintstones: fred barney wilma dino
while ( $line = <> ) {
    ($who, $rest) = split /:\s*/, $line, 2;
    @fields = split ' ', $rest;
    $HoL{$who} = [ @fields ];
}
# calling a function that returns an array
for $group ( "simpsons", "jetsons", "flintstones" ) {
    $HoL{$group} = [ get_family($group) ];
}
# likewise, but using temporary variables
for $group ( "simpsons", "jetsons", "flintstones" ) {
    @members = get_family($group);
    $HoL{$group} = [ @members ];
}
# append new members to an existing family
push @{ $HoL{flintstones} }, "wilma", "betty";

Access and printing of a hash of arrays

# one element
$HoL{flintstones}[0] = "Fred";
# another element
$HoL{simpsons}[1] =~ s/(\w)/\u$1/;
# print the whole thing
foreach $family ( keys %HoL ) {
    print "$family: @{ $HoL{$family} }\n";
}
# print the whole thing with indices
foreach $family ( keys %HoL ) {
    print "$family: ";
    foreach $i ( 0 .. $#{ $HoL{$family} } ) {
        print " $i = $HoL{$family}[$i]";
    }
}
print "\n";
# print the whole thing sorted by number of members
foreach $family ( sort { @{$HoL{$b}} <=> @{$HoL{$a}} } keys %HoL ) {
    print "$family: @{ $HoL{$family} }\n"
}
# print the whole thing sorted by number of members and name
foreach $family ( sort { @{$HoL{$b}} <=> @{$HoL{$a}} } keys %HoL ) {
    print "$family: ", join(", ", sort @{ $HoL{$family} }), "\n";
}

Arrays of Hashes

An array of hashes is called for when you have a bunch of records that you'd like to access sequentially, but each record itself contains key/value pairs. These arrays tend to be used less frequently than the other homogeneous data structures.

Composition of an array of hashes

@LoH = (
    {
       lead     => "fred",
       friend   => "barney",
    },
    {
       lead    => "george",
       wife    => "jane",
       son     => "elroy",
    },
    {
       lead    => "homer",
       wife    => "marge",
       son     => "bart",
    },
  );

Generation of an array of hashes

# reading from file
# format: lead=fred friend=barney
while ( <> ) {
    $rec = {};
    for $field ( split ) {
        ($key, $value) = split /=/, $field;
        $rec->{$key} = $value;
    }
    push @LoH, $rec;
}
# reading from file
# format: lead=fred friend=barney
# no temp
while ( <> ) {
    push @LoH, { split /[\s=]+/ };
}
# calling a function that returns a key,value array, like
# "lead","fred","daughter","pebbles"
while ( %fields = getnextpairset() ) {
    push @LoH, { %fields };
}
# likewise, but using no temp vars
while (<>) {
    push @LoH, { parsepairs($_) };
}
# add key/value to an element
$LoH[0]{pet} = "dino";
$LoH[2]{pet} = "santa's little helper";

Access and printing of an array of hashes

# one element
$LoH[0]{lead} = "fred";
# another element
$LoH[1]{lead} =~ s/(\w)/\u$1/;
# print the whole thing with refs
for $href ( @LoH ) {
    print "{ ";
    for $role ( keys %$href ) {
         print "$role=$href->{$role} ";
    }
    print "}\n";
}
# print the whole thing with indices
for $i ( 0 .. $#LoH ) {
    print "$i is { ";
    for $role ( keys %{ $LoH[$i] } ) {
         print "$role=$LoH[$i]{$role} ";
    }
    print "}\n";
}
# print the whole thing one at a time
for $i ( 0 .. $#LoH ) {
    for $role ( keys %{ $LoH[$i] } ) {
         print "element $i $role is $LoH[$i]{$role}\n";
    }
}

Hashes of Hashes

A multi-dimensional hash is the most flexible of Perl's homogeneous structures. It's like building up a record that itself contains other records. At each level you index into the hash with a string (quoted if it contains spaces). Remember, however, that the key/value pairs in the hash won't come out in any particular order. You must do your own sorting if the order matters.

Composition of a hash of hashes

%HoH = (
    flintstones => {
        lead      => "fred",
        pal       => "barney",
    },
    jetsons => {
        lead      => "george",
        wife      => "jane",
        "his boy" => "elroy",  # key quotes needed
    },
    simpsons => {
        lead      => "homer",
        wife      => "marge",
        kid       => "bart",
    },
);

Generation of a hash of hashes

# reading from file
# flintstones: lead=fred pal=barney wife=wilma pet=dino
while ( <> ) {
    next unless s/^(.*?):\s*//;
    $who = $1;
    for $field ( split ) {
        ($key, $value) = split /=/, $field;
        $HoH{$who}{$key} = $value;
    }
}
# reading from file; more temporary variables
while ( <> ) {
    next unless s/^(.*?):\s*//;
    $who = $1;
    $rec = {};
    $HoH{$who} = $rec;
    for $field ( split ) {
        ($key, $value) = split /=/, $field;
        $rec->{$key} = $value;
    }
}
# calling a function that returns a key,value for the inner hash
for $group ( "simpsons", "jetsons", "flintstones" ) {
    $HoH{$group} = { get_family($group) };
}
# likewise, but using temporary variables
for $group ( "simpsons", "jetsons", "flintstones" ) {
    %members = get_family($group);
    $HoH{$group} = { %members };
}
# calling a function that returns the outer hash, including
# references to the created inner hashes
sub hash_families {
    my @ret;
    foreach $group ( @_ ) {
        push @ret, $group, { get_family($group) };
    }
    @ret;
}
%HoH = hash_families( "simpsons", "jetsons", "flintstones" );
# append new members to an existing family
%new_folks = (
    wife => "wilma",
    pet  => "dino";
);
for $what (keys %new_folks) {
    $HoH{flintstones}{$what} = $new_folks{$what};
}

Access and printing of a hash of hashes

# one element
$HoH{flintstones}{wife} = "wilma";
# another element
$HoH{jetsons}{'his boy'} =~ s/(\w)/\u$1/;
# print the whole thing
foreach $family ( keys %HoH ) {
    print "$family: ";
    foreach $role ( keys %{ $HoH{$family} } ) {
         print "$role=$HoH{$family}{$role} ";
    }
    print "\n";
}
# print the whole thing, using temporaries
while ( ($family,$roles) = each %HoH ) {
    print "$family: ";
    while ( ($role,$person) = each %$roles ) {  # using each precludes sorting
        print "$role=$person ";
    }
    print "\n";
}
# print the whole thing somewhat sorted
foreach $family ( sort keys %HoH ) {
    print "$family: ";
    foreach $role ( sort keys %{ $HoH{$family} } ) {
         print "$role=$HoH{$family}{$role} ";
    }
    print "\n";
}
# print the whole thing sorted by number of members
foreach $family ( sort { keys %{$HoH{$a}} <=> keys %{$HoH{$b}} } keys %HoH ) {
    print "$family: ";
    foreach $role ( sort keys %{ $HoH{$family} } ) {
         print "$role=$HoH{$family}{$role} ";
    }
    print "\n";
}
# establish a sort order (rank) for each role
$i = 0;
for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i }
# now print the whole thing sorted by number of members
foreach $family ( sort { keys %{$HoH{$a}} <=> keys %{$HoH{$b}} } keys %HoH ) {
    print "$family: ";
    # and print these according to rank order
    foreach $role ( sort { $rank{$a} <=> $rank{$b} } keys %{ $HoH{$family} } ) {
        print "$role=$HoH{$family}{$role} ";
    }
    print "\n";
}

More Elaborate Records

Those were simple, two-level, homogeneous data structures: each element contains the same kind of thingy as all the other elements do. It certainly doesn't have to be that way. Any element can hold any kind of scalar, which means that it could be a string, a number, or a reference to anything at all, including more exotic things than just array or hash references, such as references to named or anonymous functions or opaque objects. The only thing you can't do is to put more than one kind of thingy into a given scalar simultaneously. If you find yourself trying to do that, it's a signal that you need to establish an array or hash at the next lower level to handle the different types of thingy you're trying to overlay.

Below you will find code examples designed to illustrate all the possible kinds of things you might want to keep in a record. For our base structure, we'll use a hash reference. The keys are uppercase strings, a convention sometimes employed when the hash is being used as a specific record type rather than as a more generic associative array.

Composition of more elaborate records

This shows how to create and use a record whose fields are of many sorts:

$rec = {
    TEXT      => $string,
    SEQUENCE  => [ @old_values ],
    LOOKUP    => { %some_table },
    THATCODE  => \&some_function,
    THISCODE  => sub { $_[0] ** $_[1] },
    HANDLE    => \*STDOUT,
};
print $rec->{TEXT};
print $rec->{SEQUENCE}[0];
$last = pop @{ $rec->{SEQUENCE} };
print $rec->{LOOKUP}{"key"};
($first_k, $first_v) = each %{ $rec->{LOOKUP} };
# no difference calling named or anonymous subs
$answer = &{ $rec->{THATCODE} }($arg);
$answer = &{ $rec->{THISCODE} }($arg1, $arg2);
# must have extra braces on indirect object slot
print { $rec->{HANDLE} } "a string\n";
use FileHandle;
$rec->{HANDLE}->autoflush(1);
$rec->{HANDLE}->print("a string\n");

Composition of more elaborate records

%TV = (
    flintstones => {
        series   => "flintstones",
        nights   => [ qw(monday thursday friday) ],
        members  => [
            { name => "fred",    role => "lead", age  => 36, },
            { name => "wilma",   role => "wife", age  => 31, },
            { name => "pebbles", role => "kid",  age  =>  4, },
        ],
    },
    jetsons     => {
        series   => "jetsons",
        nights   => [ qw(wednesday saturday) ],
        members  => [
            { name => "george",  role => "lead", age  => 41, },
            { name => "jane",    role => "wife", age  => 39, },
            { name => "elroy",   role => "kid",  age  =>  9, },
        ],
     },
    simpsons    => {
        series   => "simpsons",
        nights   => [ qw(monday) ],
        members  => [
            { name => "homer", role => "lead", age => 34, },
            { name => "marge", role => "wife", age => 37, },
            { name => "bart",  role => "kid",  age => 11, },
        ],
     },
  );

Generation of a hash of complex records

Because Perl is quite good at parsing complex data structures, you might just put your data declarations in a separate file as regular Perl code and then load them in with do or require. See Chapter 3, Functions, for details on those functions.

# here's a piece by piece build up
$rec = {};
$rec->{series} = "flintstones";
$rec->{nights} = [ find_days() ];
@members = ();
# assume this file is in field=value syntax
while (<>) {
     %fields = split /[\s=]+/;
     push @members, { %fields };
}
$rec->{members} = [ @members ];
# now remember the whole thing
$TV{ $rec->{series} } = $rec;

You can use extra pointer fields to avoid duplicate data. For example, you might want a "kids" field included in a person's record. This could be a reference to a list consisting of references to the kids' own records. That way you avoid the update problems that result from having the same data in two places.

foreach $family (keys %TV) {
    my $rec = $TV{$family}; # temp pointer
    @kids = ();
    for $person ( @{$rec->{members}} ) {
        if ($person->{role} =~ /kid|son|daughter/) {
            push @kids, $person;
        }
    }
    # REMEMBER: $rec and $TV{$family} point to same data!!
    $rec->{kids} = [ @kids ];
}
# you copied the array, but the array itself contains pointers to
# uncopied objects. this means that if you make bart get older via
$TV{simpsons}{kids}[0]{age}++;
# then this would also change here
print $TV{simpsons}{members}[2]{age};
# because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2]
# both point to the same underlying anonymous hash table
# print the whole thing
foreach $family ( keys %TV ) {
    print "the $family";
    print " is on during @{ $TV{$family}{nights} }\n";
    print "its members are:\n";
    for $who ( @{ $TV{$family}{members} } ) {
        print " $who->{name} ($who->{role}), age $who->{age}\n";
    }
    print "it turns out that $TV{$family}{'lead'} has ";
    print scalar ( @{ $TV{$family}{kids} } ), " kids named ";
    print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } );
    print "\n";
}


A Brief Tutorial: Manipulating Lists of Lists	Book Index	Packages, Modules, and Object Classes