17. Template-Driven Code Generation

Contents:
On Code Generation
Jeeves Example
Jeeves Overview
Jeeves Implementation
Sample Specification Parser
Resources

I'd rather write programs to write programs than write programs.

- Programming Pearls, Communications of the ACM, Sept. 1985

This chapter builds a template-driven code generator, an indispensable tool in a C, C++, or Java programmer's toolbox. The chapter has two objectives: to make the case for code generation as a method of code reuse and to present a small but nontrivial problem that can exercise all the Perl concepts you learned in the first half of the book: complex data structures, modules, objects, and eval. Enjoy!

17.1 On Code Generation

Programmers create and use tiny specification languages all the time. Database schemas, resources (rc files in Unix such as .mwmrc and .openwinrc), user interface specifications (Motif UIL files), network interface specifications (RPC or CORBA IDL files), and so on are all examples of such languages. These languages enable you to state your requirements in a high-level, compact, and declarative format; for example, in Motif's UIL (User Interface Language), you can simply state that you want two buttons inside a form and spare yourself the effort of writing 20 or so statements in C to achieve the same effect.

The semantic gap between these specification languages and conventional systems-programming languages such as C or C++ can be bridged in one of two ways. The first approach is for the C application to treat the specification as meta-data; that is, the application embeds the specification parser and exchanges information with it using C data structures and an internal API. The second approach is to have a standalone compiler to translate the specification to C, which in turn is linked to the application. RPC systems and CASE tools prefer this approach.

In the following pages, we will study the second alternative and build ourselves a configurable code generation framework called Jeeves.[1]

[1] Jeeves is the efficient butler in P.G. Wodehouse's novels, who does all the work for his bumbling master with at most a twitch of an eyebrow.

The code generators we mentioned previously are clearly domain-specific. In practice, I have also found most of them to be needlessly specific in their output capabilities. Consider the following examples:

RPC

NOTE: The Remote Procedure Call facility allows you to call a procedure in a different address space, possibly on a different machine. You specify a list of procedures that you wish to export in an Interface Definition Language (IDL) and feed it to an IDL compiler, which produces some C code for the client and server ends. Link these pieces of code to your application, and voilà, you have network transparency.
Most commercial IDL compilers are remarkably inflexible about changing their output code. They make it hard for you to insert probes for monitoring network performance or auditing data flowing across the network. If you want to transparently encrypt the data before it is put "on the wire," you are often out of luck. Sure, you can change the C code output by the IDL compiler, but your changes will get overwritten the next time you run the IDL compiler.

CASE

Many CASE tools generate C code from object model specifications. The following sample specification lists entity classes and their attributes and specifies the degree and cardinality of relationships between these classes:

    Employee {
        int         emp_id   key
        string[40]  name     
        Department  dept_id  
        double      salary   
    }
    Department {
        int         dept_id  key
        string[20]  name
    }
    Relationship Department(1) contains Employee (n)

Given this tiny specification language, we can, for example, automatically generate C and embedded SQL code to maintain database tables, as shown below:

    int create_employee_table {
        exec sql create table employee_table (
             employee_id integer,
             name varchar, salary float);
        return check_db_error();
    }
    int create_employee (employee *e) {
        if (!check_dept(e->dept)) 
            return 0;
        e->employee_id = ++g_employee_id;
        exec sql insert into table employee_table (
                 employee_id, name, salary) 
                 values (:*e);
        return check_db_error();
    }

The specification also provides enough information to generate code for creating C++ classes for each entity and for managing referential integrity constraints ("cannot delete a department object if it contains one or more employees").

Most CASE tools suffer from the restriction that they can generate only a fixed pattern of code. Buy an object-oriented database tomorrow, and the output code shown earlier doesn't help much. If this pattern is hardcoded, you are left with a mere diagramming tool (a mighty expensive one too).

POD, Javadoc

The entire Perl documentation is written in a format called POD (plain old documentation). It provides simple, high-level primitives for specifying paragraph styles (=head1, =item) and character styles (B<foo> prints the word in boldface, for example). The distribution comes with tools such as pod2text, pod2html, pod2man, and so on. POD documents can be embedded in code, and extracted by these tools (the Perl interpreter ignores these directives). This facility reduces the possibility of mismatches between code and documentation since they are all in one place.

Similarly, all Java libraries are documented using a format known as Javadoc. The documentation is extracted and converted to HTML by a tool called javadoc.

Both sets of tools are limited to specific outputs (ASCII, HTML, and so on). For example, if you want to write a pod2rtf translator (Rich Text Format, used on Microsoft Windows systems), you have to start from scratch, because the POD parser is not available as a separate package. The better option would have been to centralize the POD parser and allow several different plug-and-play back ends.

SWIG, XS

In Chapter 18, Extending Perl:A First Course, we will have occasion to study two tools called SWIG and XS. Given an interface specification, they generate code to bind Perl and custom C extensions together. In fact, SWIG is a classic example of the type of code generators we would like to build: From one specification language, this tool is capable of producing a variety of output code, because its back end is template-driven.

In most of these cases, the demand for different types of output typically exceeds the number of changes made to the input specification format. We can make two observations as a consequence. First, parsing the input and producing the final output are related but separate tasks. Second, the output needs to be configurable. This can be arranged either by having one parameterizable output generator or by having a number of output generators that can be used interchangeably with the input parser. In my experience, the first option is not often practical. For example, it is pointless to write one output generator in the POD case, which can output HTML or ASCII or RTF just by tweaking a few parameters; they are very different sets of outputs.

The Jeeves framework goes for the second option. It helps you write a configurable translator by supplying a template-driven code-generating back end. This module allows you to write configurable templates with loops, if/then conditions, variables, and bits of Perl code, so it is no ordinary cookie-cutter cookie-cutter (otherwise, it might have been called yacccc).

An example serves to better explain this framework.