Reading the Code - GNU libavl 2.0.2

This book contains all the source code to libavl. Conversely, much of the source code presented in this book is part of libavl.

libavl is written in ANSI/ISO C89 using TexiWEB, a literate programming (see literate programming) system. Literate programming is a philosophy that regards software as a kind of literature. The ideas behind literate programming have been around for a long time, but the term itself was invented by computer scientist Donald Knuth in 1984, who wrote two of his most famous programs (TeX and METAFONT) with a literate programming system of his own design. That system, called WEB, inspired the form and much of the syntax of TexiWEB.

A TexiWEB document is a C program that has been cut into sections, rearranged, and annotated, with the goal to make the program as a whole as comprehensible as possible to a reader who starts at the beginning and reads the entire program in order. Of course, understanding large, complex programs cannot be trivial, but TexiWEB tries to make it as easy as possible.

Each section of a TexiWEB program is assigned both a number and a name. Section numbers are assigned sequentially, starting from 1 with the first section, and they are used for cross-references between sections. Section names are words or phrases assigned by the TexiWEB program's author to describe the role of the section's code.

Here's a sample TexiWEB section:

19. <Clear hash table entries 19> =
for (i = 0; i < hash->m; i++)
  hash->entry[i] = NULL;

This code is included in 15.

The first line of a section, as shown here, gives the section's name and its number within angle brackets. The section number is also given at the left margin to make individual sections easy to find.

In TexiWEB, C's reserved words are shown like this: int, struct, while.... Types defined with typedef or with struct, union, and enum tags are shown the same way. Identifiers in all capital letters (often names of macros) are shown like this: BUFSIZ, EOF, ERANGE.... Other identifiers are shown like this: getc, argv, strlen....

Code segments often contain references to other code segments, shown as a section name and number within angle brackets. These act something like macros, in that they stand for the corresponding replacement text. For instance, consider the following segment:

15. <Initialize hash table 15> =
hash->m = 13;
<Clear hash table entries 19>

See also 16.

This means that the code for `Clear hash table entries' should be inserted as part of `Initialize hash table'. Because the name of a section explains what it does, it's often unnecessary to know anything more. If you do want more detail, the section number 19 in <Clear hash table entries 19> can easily be used to find the full text and annotations for `Clear hash table entries'. You can also view the fully expanded code in a code segment by following the link from the segment name or number (our example does not include this feature). At the bottom of section 19 you will find a note reading `This code is included in 15.', making it easy to move back to section 15 that includes it.

There's also a note following the code in the section above: `See also 16.'. This demonstrates how TexiWEB handles multiple sections that have the same name. When a name that corresponds to multiple sections is referenced, code from all the sections with that name is substituted, in order of appearance. The first section with the name ends with a note listing the numbers of all other same-named sections. Later sections show their own numbers in the left margin, but the number of the first section within angle brackets, to make the first section easy to find. For example, here's another line of code for <Clear hash table entries 15>:

16. <Initialize hash table 15> +=
hash->n = 0;

Code segment references have one more feature: the ability to do special macro replacements within the referenced code. These replacements are made on all words within the code segment referenced and recursively within code segments that the segment references, and so on. Word prefixes as well as full words are replaced, as are even occurrences within comments in the referenced code. Replacements take place regardless of case, and the case of the replacement mirrors the case of the replaced text. This odd feature is useful for adapting a section of code written for one library having a particular identifier prefix for use in a different library with another identifier prefix. For instance, the reference `<BST types; bst => avl>' inserts the contents of the segment named `BST types', replacing `bst' by `avl' wherever the former appears at the beginning of a word.

When a TexiWEB program is converted to C, conversion conceptually begins from sections named for files; e.g., <foo.c 37>. Within these sections, all section references are expanded, then references within those sections are expanded, and so on. When expansion is complete, the specified files are written out.

A final resource in reading a TexiWEB is the index, which contains an entry for the points of declaration of every section name, function, type, structure, union, global variable, and macro. Declarations within functions are not indexed.

See also: [Knuth 1992], “How to read a WEB”.