Once you've mastered creating and using multi-dimensional arrays (lists of lists), you'll want to be able to make more complex data structures. If you're looking for C structures or Pascal records, you won't find any special reserved words in Perl to set these up for you. What you get instead is a more flexible system.[8] Perl has just two ways of organizing data: either as ordered lists stored in arrays and accessed by position, or as unordered key/value pairs stored in hashes and accessed by name.
[8] If your idea of a record structure is less flexible than this, or if you'd like to provide your users with something more opaque and rigid, then you can use the object-oriented features detailed in Chapter 5, Packages, Modules, and Object Classes.
The best way to represent a record in Perl is using a hash reference, but how you choose to organize such records will vary. You may wish to keep an ordered list of these records around that you can look up by number, in which case you'd use an array to store the records (hash references). But you might wish to look up the records by name, in which case you'd store them in another hash. You could even do both at once: the array and the hash could each hold references to the same records, which are after all just anonymous hash thingies, and each one can have as many references to it as you want, within reason.[9]
[9] Where reason is defined as 2**32 references, minus one. That's probably sufficient for most folks.
In the following sections you will find code examples detailing how to compose, generate, access, and print out each of five data structures. The first four examples are straightforward homogeneous combinations of arrays and hashes, while the last one demonstrates how to use a less regular data structure. These examples, presented with little comment, assume that you have already familiarized yourself with the earlier explanations set forth in this chapter.
Use an array of arrays when you want a basic two-dimensional matrix. One application might include making a list of all the hosts on your network, but each of these hosts would have several possible aliases. Another might be a list of daily menus, each of which would itself be a list of foods served in it. For our example, we'll keep several lists of famous television characters, all stored together in one large list of lists.
@LoL = ( [ "fred", "barney" ], [ "george", "jane", "elroy" ], [ "homer", "marge", "bart" ], );
# reading from a file while ( <> ) { push @LoL, [ split ]; } # calling a function for $i ( 1 .. 10 ) { $LoL[$i] = [ somefunc($i) ]; } # using temp vars for $i ( 1 .. 10 ) { @tmp = somefunc($i); $LoL[$i] = [ @tmp ]; } # add to an existing row push @{ $LoL[0] }, "wilma", "betty";
# one element $LoL[0][0] = "Fred"; # another element $LoL[1][1] =~ s/(\w)/\u$1/; # print the whole thing with refs for $array_ref ( @LoL ) { print "\t [ @$array_ref ],\n"; } # print the whole thing with indices for $i ( 0 .. $#LoL ) { print "\t [ @{$LoL[$i]} ],\n"; } # print the whole thing one at a time for $i ( 0 .. $#LoL ) { for $j ( 0 .. $#{$LoL[$i]} ) { print "element $i $j is $LoL[$i][$j]\n"; } }
Use a hash of arrays when you want to look up each array by a particular string rather than merely by an index number. In our example of television characters, rather than merely looking up the list of names by the zeroth show, the first show, and so on, we'll set it up so we can look up the cast list according to the name of the show.
Because our outer data structure is a hash, we've lost all ordering of its contents. That means when you print it out, you can't predict the order things will come out. You can call the sort function and print its result if you'd like a particular output order.
# we customarily omit quotes when keys are identifiers %HoL = ( flintstones => [ "fred", "barney" ], jetsons => [ "george", "jane", "elroy" ], simpsons => [ "homer", "marge", "bart" ], );
# reading from file with the following format: # flintstones: fred barney wilma dino while ( <> ) { next unless s/^(.*?):\s*//; $HoL{$1} = [ split ]; } # reading from file; more temporary variables # flintstones: fred barney wilma dino while ( $line = <> ) { ($who, $rest) = split /:\s*/, $line, 2; @fields = split ' ', $rest; $HoL{$who} = [ @fields ]; } # calling a function that returns an array for $group ( "simpsons", "jetsons", "flintstones" ) { $HoL{$group} = [ get_family($group) ]; } # likewise, but using temporary variables for $group ( "simpsons", "jetsons", "flintstones" ) { @members = get_family($group); $HoL{$group} = [ @members ]; } # append new members to an existing family push @{ $HoL{flintstones} }, "wilma", "betty";
# one element $HoL{flintstones}[0] = "Fred"; # another element $HoL{simpsons}[1] =~ s/(\w)/\u$1/; # print the whole thing foreach $family ( keys %HoL ) { print "$family: @{ $HoL{$family} }\n"; } # print the whole thing with indices foreach $family ( keys %HoL ) { print "$family: "; foreach $i ( 0 .. $#{ $HoL{$family} } ) { print " $i = $HoL{$family}[$i]"; } } print "\n"; # print the whole thing sorted by number of members foreach $family ( sort { @{$HoL{$b}} <=> @{$HoL{$a}} } keys %HoL ) { print "$family: @{ $HoL{$family} }\n" } # print the whole thing sorted by number of members and name foreach $family ( sort { @{$HoL{$b}} <=> @{$HoL{$a}} } keys %HoL ) { print "$family: ", join(", ", sort @{ $HoL{$family} }), "\n"; }
An array of hashes is called for when you have a bunch of records that you'd like to access sequentially, but each record itself contains key/value pairs. These arrays tend to be used less frequently than the other homogeneous data structures.
@LoH = ( { lead => "fred", friend => "barney", }, { lead => "george", wife => "jane", son => "elroy", }, { lead => "homer", wife => "marge", son => "bart", }, );
# reading from file # format: lead=fred friend=barney while ( <> ) { $rec = {}; for $field ( split ) { ($key, $value) = split /=/, $field; $rec->{$key} = $value; } push @LoH, $rec; } # reading from file # format: lead=fred friend=barney # no temp while ( <> ) { push @LoH, { split /[\s=]+/ }; } # calling a function that returns a key,value array, like # "lead","fred","daughter","pebbles" while ( %fields = getnextpairset() ) { push @LoH, { %fields }; } # likewise, but using no temp vars while (<>) { push @LoH, { parsepairs($_) }; } # add key/value to an element $LoH[0]{pet} = "dino"; $LoH[2]{pet} = "santa's little helper";
# one element $LoH[0]{lead} = "fred"; # another element $LoH[1]{lead} =~ s/(\w)/\u$1/; # print the whole thing with refs for $href ( @LoH ) { print "{ "; for $role ( keys %$href ) { print "$role=$href->{$role} "; } print "}\n"; } # print the whole thing with indices for $i ( 0 .. $#LoH ) { print "$i is { "; for $role ( keys %{ $LoH[$i] } ) { print "$role=$LoH[$i]{$role} "; } print "}\n"; } # print the whole thing one at a time for $i ( 0 .. $#LoH ) { for $role ( keys %{ $LoH[$i] } ) { print "element $i $role is $LoH[$i]{$role}\n"; } }
A multi-dimensional hash is the most flexible of Perl's homogeneous structures. It's like building up a record that itself contains other records. At each level you index into the hash with a string (quoted if it contains spaces). Remember, however, that the key/value pairs in the hash won't come out in any particular order. You must do your own sorting if the order matters.
%HoH = ( flintstones => { lead => "fred", pal => "barney", }, jetsons => { lead => "george", wife => "jane", "his boy" => "elroy", # key quotes needed }, simpsons => { lead => "homer", wife => "marge", kid => "bart", }, );
# reading from file # flintstones: lead=fred pal=barney wife=wilma pet=dino while ( <> ) { next unless s/^(.*?):\s*//; $who = $1; for $field ( split ) { ($key, $value) = split /=/, $field; $HoH{$who}{$key} = $value; } } # reading from file; more temporary variables while ( <> ) { next unless s/^(.*?):\s*//; $who = $1; $rec = {}; $HoH{$who} = $rec; for $field ( split ) { ($key, $value) = split /=/, $field; $rec->{$key} = $value; } } # calling a function that returns a key,value for the inner hash for $group ( "simpsons", "jetsons", "flintstones" ) { $HoH{$group} = { get_family($group) }; } # likewise, but using temporary variables for $group ( "simpsons", "jetsons", "flintstones" ) { %members = get_family($group); $HoH{$group} = { %members }; } # calling a function that returns the outer hash, including # references to the created inner hashes sub hash_families { my @ret; foreach $group ( @_ ) { push @ret, $group, { get_family($group) }; } @ret; } %HoH = hash_families( "simpsons", "jetsons", "flintstones" ); # append new members to an existing family %new_folks = ( wife => "wilma", pet => "dino"; ); for $what (keys %new_folks) { $HoH{flintstones}{$what} = $new_folks{$what}; }
# one element $HoH{flintstones}{wife} = "wilma"; # another element $HoH{jetsons}{'his boy'} =~ s/(\w)/\u$1/; # print the whole thing foreach $family ( keys %HoH ) { print "$family: "; foreach $role ( keys %{ $HoH{$family} } ) { print "$role=$HoH{$family}{$role} "; } print "\n"; } # print the whole thing, using temporaries while ( ($family,$roles) = each %HoH ) { print "$family: "; while ( ($role,$person) = each %$roles ) { # using each precludes sorting print "$role=$person "; } print "\n"; } # print the whole thing somewhat sorted foreach $family ( sort keys %HoH ) { print "$family: "; foreach $role ( sort keys %{ $HoH{$family} } ) { print "$role=$HoH{$family}{$role} "; } print "\n"; } # print the whole thing sorted by number of members foreach $family ( sort { keys %{$HoH{$a}} <=> keys %{$HoH{$b}} } keys %HoH ) { print "$family: "; foreach $role ( sort keys %{ $HoH{$family} } ) { print "$role=$HoH{$family}{$role} "; } print "\n"; } # establish a sort order (rank) for each role $i = 0; for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i } # now print the whole thing sorted by number of members foreach $family ( sort { keys %{$HoH{$a}} <=> keys %{$HoH{$b}} } keys %HoH ) { print "$family: "; # and print these according to rank order foreach $role ( sort { $rank{$a} <=> $rank{$b} } keys %{ $HoH{$family} } ) { print "$role=$HoH{$family}{$role} "; } print "\n"; }
Those were simple, two-level, homogeneous data structures: each element contains the same kind of thingy as all the other elements do. It certainly doesn't have to be that way. Any element can hold any kind of scalar, which means that it could be a string, a number, or a reference to anything at all, including more exotic things than just array or hash references, such as references to named or anonymous functions or opaque objects. The only thing you can't do is to put more than one kind of thingy into a given scalar simultaneously. If you find yourself trying to do that, it's a signal that you need to establish an array or hash at the next lower level to handle the different types of thingy you're trying to overlay.
Below you will find code examples designed to illustrate all the possible kinds of things you might want to keep in a record. For our base structure, we'll use a hash reference. The keys are uppercase strings, a convention sometimes employed when the hash is being used as a specific record type rather than as a more generic associative array.
This shows how to create and use a record whose fields are of many sorts:
$rec = { TEXT => $string, SEQUENCE => [ @old_values ], LOOKUP => { %some_table }, THATCODE => \&some_function, THISCODE => sub { $_[0] ** $_[1] }, HANDLE => \*STDOUT, }; print $rec->{TEXT}; print $rec->{SEQUENCE}[0]; $last = pop @{ $rec->{SEQUENCE} }; print $rec->{LOOKUP}{"key"}; ($first_k, $first_v) = each %{ $rec->{LOOKUP} }; # no difference calling named or anonymous subs $answer = &{ $rec->{THATCODE} }($arg); $answer = &{ $rec->{THISCODE} }($arg1, $arg2); # must have extra braces on indirect object slot print { $rec->{HANDLE} } "a string\n"; use FileHandle; $rec->{HANDLE}->autoflush(1); $rec->{HANDLE}->print("a string\n");
%TV = ( flintstones => { series => "flintstones", nights => [ qw(monday thursday friday) ], members => [ { name => "fred", role => "lead", age => 36, }, { name => "wilma", role => "wife", age => 31, }, { name => "pebbles", role => "kid", age => 4, }, ], }, jetsons => { series => "jetsons", nights => [ qw(wednesday saturday) ], members => [ { name => "george", role => "lead", age => 41, }, { name => "jane", role => "wife", age => 39, }, { name => "elroy", role => "kid", age => 9, }, ], }, simpsons => { series => "simpsons", nights => [ qw(monday) ], members => [ { name => "homer", role => "lead", age => 34, }, { name => "marge", role => "wife", age => 37, }, { name => "bart", role => "kid", age => 11, }, ], }, );
Because Perl is quite good at parsing complex data structures, you might just put your data declarations in a separate file as regular Perl code and then load them in with do or require. See Chapter 3, Functions, for details on those functions.
# here's a piece by piece build up $rec = {}; $rec->{series} = "flintstones"; $rec->{nights} = [ find_days() ]; @members = (); # assume this file is in field=value syntax while (<>) { %fields = split /[\s=]+/; push @members, { %fields }; } $rec->{members} = [ @members ]; # now remember the whole thing $TV{ $rec->{series} } = $rec;
You can use extra pointer fields to avoid duplicate data. For example, you might want a "kids" field included in a person's record. This could be a reference to a list consisting of references to the kids' own records. That way you avoid the update problems that result from having the same data in two places.
foreach $family (keys %TV) { my $rec = $TV{$family}; # temp pointer @kids = (); for $person ( @{$rec->{members}} ) { if ($person->{role} =~ /kid|son|daughter/) { push @kids, $person; } } # REMEMBER: $rec and $TV{$family} point to same data!! $rec->{kids} = [ @kids ]; } # you copied the array, but the array itself contains pointers to # uncopied objects. this means that if you make bart get older via $TV{simpsons}{kids}[0]{age}++; # then this would also change here print $TV{simpsons}{members}[2]{age}; # because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2] # both point to the same underlying anonymous hash table # print the whole thing foreach $family ( keys %TV ) { print "the $family"; print " is on during @{ $TV{$family}{nights} }\n"; print "its members are:\n"; for $who ( @{ $TV{$family}{members} } ) { print " $who->{name} ($who->{role}), age $who->{age}\n"; } print "it turns out that $TV{$family}{'lead'} has "; print scalar ( @{ $TV{$family}{kids} } ), " kids named "; print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } ); print "\n"; }