You want to let users enter their own patterns, but an invalid one would abort your program the first time you tried to use it.
Test the pattern in an eval
{}
construct first, matching against some dummy string. If $@
is not set, no exception occurred, so you know the pattern successfully compiled as a valid regular expression. Here is a loop that continues prompting until the user supplies a valid pattern:
do { print "Pattern? "; chomp($pat = <>); eval { "" =~ /$pat/ }; warn "INVALID PATTERN $@" if $@; } while $@;
Here's a standalone subroutine that verifies whether a pattern is valid.
sub is_valid_pattern { my $pat = shift; return eval { "" =~ /$pat/; 1 } || 0; }
That one relies upon the block returning 1
if it completes, which in the case of an exception, never happens.
There's no end to patterns that won't compile. The user could mistakenly enter "<I\s*[^>"
, "*** GET RICH ***"
, or "+5-i"
. If you blindly use the proffered pattern in your program, it will cause an exception, normally a fatal event.
The tiny program in Example 6.9 demonstrates this.
#!/usr/bin/perl # paragrep - trivial paragraph grepper die "usage: $0 pat [files]\n" unless @ARGV; $/ = ''; $pat = shift; eval { "" =~ /$pat/; 1 } or die "$0: Bad pattern $pat: $@\n"; while (<>) { print "$ARGV $.: $_" if /$pat/o; }
That /o
is a promise to Perl that the interpolated variable's contents are constant over the program's entire run. It's an efficiency hack. Even if $pat
changes, Perl won't notice.
You could encapsulate this in a function call that returns 1 if the block completes and 0 if not as shown in the Solution section. Although eval
"/$pat/"
would also work to trap the exception, it has two other problems. First of all, there couldn't be any slashes (or whatever your chosen pattern delimiter is) in the string the user entered. More importantly, it would open a drastic security hole that you almost certainly want to avoid. Strings like this could really ruin your day:
$pat = "You lose @{[ system('rm -rf *')]} big here";
If you don't wish to provide the user with a real pattern, you can always metaquote the string first:
$safe_pat = quotemeta($pat); something() if /$safe_pat/;
Or, even easier, use:
something() if /\Q$pat/;
But if you're going to do that, why are you using pattern matching at all? In that case, a simple use of index
would be enough.
By letting the user supply a real pattern, you give them the power into do interesting and useful things. This is a good thing. You just have to be slightly careful, that's all. Suppose they wanted to enter a case-insensitive pattern, but you didn't provide the program with an option like grep 's -i option. By permitting full patterns, the user can enter an embedded /i
modifier as (?i)
, as in /(?i)stuff/
.
What happens if the interpolated pattern expands to nothing? If $pat
is the empty string, what does /$pat/
match - that is, what does a blank //
match? It doesn't match the start of all possible strings. Surprisingly enough, matching the null pattern exhibits the dubiously useful semantics of reusing the previous successfully matched pattern. In practice, this is hard to make good use of in Perl.
Even if you use eval
to check the pattern for validity, beware: matching certain patterns takes time that is exponentially proportional to the length of the string being matched. There is no good way to detect one of these, and if the user sticks you with one, your program will appear to hang as it and the entropic heat death of the universe have a long race to see who finishes first. Setting a timer to jump out of a long-running command offers some hope for a way out of this but (as of the 5.004 release) still carries with it the possibility of a core dump if you interrupt Perl at an inopportune moment.
The eval
function in perlfunc (1) and in Chapter 2 of Programming Perl; Recipe 10.12
Copyright © 2001 O'Reilly & Associates. All rights reserved.