There are fixed limits within any awk implementation. The only trouble is that the documentation seldom reports them. Table 10.1 lists the limitations as described in The AWK Programming Language. These limitations are implementation-specific but they are good ballpark figures for most systems.
Item | Limit |
---|---|
Number of fields per record | 100 |
Characters per input record | 3000 |
Characters per output record | 3000 |
Characters per field | 1024 |
Characters per printf string | 3000 |
Characters in literal string | 400 |
Characters in character class | 400 |
Files open | 15 |
Pipes open | 1 |
NOTE: Despite the number in Table 10.1, experience has shown that most awks allow you to have more than one open pipe.
In terms of numeric values, awk uses double-precision, floating-point numbers that are limited in size by the machine's architecture.
Running into these limits can cause unanticipated problems with scripts. In developing examples for the first edition of this book, Dale thought he'd write a search program that could look for a word or sequence of words in a single paragraph. The idea was to read a document as a series of multiline records and if any of the fields contained the search term, print the record, which was a paragraph. It could be used to search through mail files where blank lines delimit paragraphs. The resulting program worked for small test files. However, when tried on larger files, the program dumped core because it encountered a paragraph that was longer than the maximum input record size, which is 3000 characters. (Actually, the file contained an included mail message where blank lines within the message were prefixed by ">".) Thus, when reading multiple lines as a single record, you better be sure that you don't anticipate records longer than 3000 characters. By the way, there is no particular error message that alerts you to the fact that the problem is the size of the current record.
Fortunately, gawk and mawk (see Chapter 11, A Flock of awks) don't have such small limits; for example, the number of fields in a record is limited in gawk to the maximum value that can be held in a C long, and certainly records can be longer than 3000 characters. These versions allow you to have more open files and pipes.
Recent versions of the Bell Labs awk have two options,
-mf N
and
-mr N
, that allow you to
set the maximum number of fields and the maximum record size on the
command line, as an emergency way to get around the default limits.
(Sed implementations also have their own limits, which aren't documented. Experience has shown that most UNIX versions of sed have a limit of 99 or 100 substitute (s) commands.)