Previous section   Next section

11.7 Pattern Matching

CVS uses two different forms of pattern matching, depending on which aspect of CVS is attempting to match the pattern. Most CVS functions use sh-style wildcards, but the scripting files in the CVSROOT directory use regular expressions.

This section is not a comprehensive study of regular expressions or wildcards. For a more complete discussion of regular expressions, I recommend Mastering Regular Expressions (O'Reilly) by Jeffrey E. F. Friedl.

11.7.1 Wildcards

Wildcards are used by most CVS functions, including wrappers and ignore files. The wildcards are evaluated by a version of the fnmatch standard function library distributed with CVS.

The wildcards are sh-style, and the symbols used in CVS include:

?

Matches any single character.

\

Escapes the special symbols, so they can be used as literals.

*

Matches any string, including the empty string.

[ ]

Matches any one of the enclosed characters. Within the brackets, the following symbols are used:

! or ^

If either of these characters is the first character after the open bracket, the brackets match anything that is not included in the brackets.

char1-char2

Denotes the range of characters between char1 and char2.

11.7.2 Regular Expressions

CVS supports regular expressions in the scripting files in the CVSROOT directory. In CVS 1.11.5, the scripting files are the only files that support regular expressions; all other files and functions use pattern matching. The scripting files are commitinfo, editinfo, loginfo, rcsinfo, taginfo, and verifymsg.

CVS regular expressions are based on the GNU Emacs regular-expression syntax, but they do not implement all of the Emacs operators. The regular expressions are parsed by a version of the regex standard function library distributed with CVS.

The CVS regular-expression operators include:

^

Matches the beginning of a line. Use ^foo to match foo at the beginning of a line.

$

Matches the end of a line. Use foo$ to match foo at the end of a line.

.

Matches any single character, except a newline character. The expression foo.bar matches foosbar and foo:bar, but not foodiebar.

*

Repeats the previous element zero or more times. The expression fo*bar matches fobar, foobar, fooobar, and any other string in that pattern.

The pattern matcher tries to match the longest string possible. If the next part of the expression fails to match, the pattern matcher rolls back one character at a time in an attempt to match the string. If it is parsing the regular expression fo*obar against the string fooobar, it initially matches the expression fo* against fooo. It then rolls back to match fo* against foo, thus freeing the obar in the string to match the obar in the expression.

+

Repeat the previous element one or more times. This operator is similar to the * operator and is processed the same way.

?

Repeat the previous element zero times or one time. The expression fo*bar matches fobar or foobar, but nothing else.

\

Escape the next character (\^ is a literal carat). Also used as part of some operators.

\|

The OR operator. Match the element on the left or the element on the right to the end of the string, enclosing parenthesis, or another OR operator.

+? or *? or ??

The same as the +, *, and ? operators, except that they attempt to match the shortest string possible rather than the longest.

[...]

A character set. Any of the characters in the character set can be matched. The [fba] character set matches any one (and only one) of f, b, or a. With the * operator, the expression [fba]* matches any sequences composed of zero or more f, b, or a characters.

The only special characters or operators inside a character set are -, ], and ^. To include a ] in a character set, use it as the first character. To include a -, use it as the last character or just after a range. To use a ^, use it anywhere except as the first character.

A character range can be specified inside a character set. Create a range by using a hyphen between the start and end of the range. a-z and 0-9 are common ranges. Note that the behavior of a mixed-case range such as A-z is undefined; use [a-zA-Z] instead.

[^...]

A complemented character set. This operator matches everything except the characters or ranges inside the set.

A ] or - that immediately follows the ^ is treated as the literal character.

\(...\)

Group expressions together so that an operation works on the set. The expression ba\(na\)+ matches bana, banana, bananana, and similar sequences.


  Previous section   Next section
Top