Recipe 9.7. Processing All Files in a Directory Recursively

9.7. Processing All Files in a Directory Recursively

Problem

You want to do something to each file and subdirectory in a particular directory.

Solution

Use the standard File::Find module.

use File::Find;
sub process_file {
    # do whatever;
}
find(\&process_file, @DIRLIST);

File::Find provides a convenient way to process a directory recursively. It does the directory scans and recursion for you. All you do is pass find a code reference and a list of directories. For each file in those directories, recursively, find calls your function.

Before calling your function, find changes to the directory being visited, whose path relative to the starting directory is stored in the $File::Find::dir variable. $_ is set to the basename of the file being visited, and the full path of that file can be found in $File::Find::name. Your code can set $File::Find::prune to true to tell find not to descend into the directory just seen.

This simple example demonstrates File::Find. We give find an anonymous subroutine that prints the name of each file visited and adds a / to the names of directories:

@ARGV = qw(.) unless @ARGV;
use File::Find;
find sub { print $File::Find::name, -d && '/', "\n" }, @ARGV;

This prints a / after directory names using the -d file test operator, which returns the empty string '' if it fails.

The following program prints the sum of everything in a directory. It gives find an anonymous subroutine to keep a running sum of the sizes of each file it visits. That includes all inode types, including the sizes of directories and symbolic links, not just regular files. Once the find function returns, the accumulated sum is displayed.

use File::Find;
@ARGV = ('.') unless @ARGV;
my $sum = 0;
find sub { $sum += -s }, @ARGV;
print "@ARGV contains $sum bytes\n";

This code finds the largest single file within a set of directories:

use File::Find;
@ARGV = ('.') unless @ARGV;
my ($saved_size, $saved_name) = (-1, '');
sub biggest {
    return unless -f && -s _ > $saved_size;
    $saved_size = -s _;
    $saved_name = $File::Find::name;
}
find(\&biggest, @ARGV);
print "Biggest file $saved_name in @ARGV is $saved_size bytes long.\n";

We use $saved_size and $saved_name to keep track of the name and the size of the largest file visited. If we find a file bigger than the largest seen so far, we replace the saved name and size with the current ones. When the find is done running, the largest file and its size are printed out, rather verbosely. A more general tool would probably just print the filename, its size, or both. This time we used a named function rather than an anonymous one because the function was getting big.

It's simple to change this to find the most recently changed file:

use File::Find;
@ARGV = ('.') unless @ARGV;
my ($age, $name);
sub youngest {
    return if defined $age && $age > (stat($_))[9];
    $age = (stat(_))[9];
    $name = $File::Find::name;
}
find(\&youngest, @ARGV);
print "$name " . scalar(localtime($age)) . "\n";

The File::Find module doesn't export its $name variable, so always refer to it by its fully qualified name. The example in Example 9.2 is more a demonstration of namespace munging than of recursive directory traversal, although it does find all the directories. It makes $name in our current package an alias for the one in File::Find, which is essentially how Exporter works. Then it declares its own version of find with a prototype that lets it be called like grep or map.

Example 9.2: fdirs

#!/usr/bin/perl -lw
# fdirs - find all directories
@ARGV = qw(.) unless @ARGV;
use File::Find ();
sub find(&@) { &File::Find::find }
*name = *File::Find::name;
find { print $name if -d } @ARGV;

Our find only calls the find in File::Find, which we were careful not to import by specifying an () empty list in the use statement. Rather than write this:

find sub { print $File::Find::name if -d }, @ARGV;

we can write the more pleasant:

find { print $name if -d } @ARGV;

9.7. Processing All Files in a Directory Recursively

Problem

Solution

Discussion

Example 9.2: fdirs

See Also


9.6. Globbing, or Getting a List of Filenames Matching a Pattern		9.8. Removing a Directory and Its Contents