If we were looking for all lines of a file that contain the string abc
, we might use the grep command:
grep abc somefile >results
In this case, abc
is the regular expression that the grep command tests against each input line. Lines that match are sent to standard output, here ending up in the file results because of the command-line redirection.
In Perl, we can speak of the string abc
as a regular expression by enclosing the string in slashes:
if (/abc/) { print $_; }
But what is being tested against the regular expression abc
in this case? Why, it's our old friend, the $_
variable! When a regular expression is enclosed in slashes (as above), the $_
variable is tested against the regular expression. If the regular expression matches, the match operator returns true. Otherwise, it returns false.
For this example, the $_
variable is presumed to contain some text line and is printed if the line contains the characters abc
in sequence anywhere within the line - similar to the grep command above. Unlike the grep command, which is operating on all of the lines of a file, this Perl fragment is looking at just one line. To work on all lines, add a loop, as in:
while (<>) { if (/abc/) { print $_; } }
What if we didn't know the number of b
's between the a
and the c
? That is, what if we want to print the line if it contains an a
followed by zero or more b
's, followed by a c
. With grep, we'd say:
grep "ab*c" somefile >results
(The argument containing the asterisk is in quotes because we don't want the shell expanding that argument as if it were a filename wildcard. It has to be passed as-is to grep to be effective.) In Perl, we can say exactly the same thing:
while (<>) { if (/ab*c/) { print $_; } }
Just like grep, this means an a
followed by zero or more b
's followed by a c
.
We'll visit more uses of pattern matching in Section 7.4, "More on the Matching Operator," later in the chapter, after we talk about all kinds of regular expressions.
Another simple regular expression operator is the substitute operator, which replaces the part of a string that matches the regular expression with another string. The substitute operator looks like the s
command in the UNIX command sed utility, consisting of the letter s
, a slash, a regular expression, a slash, a replacement string, and a final slash, looking something like:
s/ab*c/def/;
The variable (in this case, $_
) is matched against the regular expression (ab*c
). If the match is successful, the part of the string that matched is discarded and replaced by the replacement string (def
). If the match is unsuccessful, nothing happens.
As with the match operator, we'll revisit the myriad options on the substitute operator later, in Section 7.5, "Substitutions."