LET'S BUILD A COMPILER! By Jack W. Crenshaw, Ph.D. 16 April 1989 Part IX: A TOP VIEW ***************************************************************** * * * COPYRIGHT NOTICE * * * * Copyright (C) 1989 Jack W. Crenshaw. All rights reserved. * * * ***************************************************************** INTRODUCTION In the previous installments, we have learned many of the techniques required to build a full-blown compiler. We've done both assignment statements (with Boolean and arithmetic expressions), relational operators, and control constructs. We still haven't addressed procedure or function calls, but even so we could conceivably construct a mini-language without them. I've always thought it would be fun to see just how small a language one could build that would still be useful. We're ALMOST in a position to do that now. The problem is: though we know how to parse and translate the constructs, we still don't know quite how to put them all together into a language. In those earlier installments, the development of our programs had a decidedly bottom-up flavor. In the case of expression parsing, for example, we began with the very lowest level constructs, the individual constants and variables, and worked our way up to more complex expressions. Most people regard the top-down design approach as being better than the bottom-up one. I do too, but the way we did it certainly seemed natural enough for the kinds of things we were parsing. You mustn't get the idea, though, that the incremental approach that we've been using in all these tutorials is inherently bottom-up. In this installment I'd like to show you that the approach can work just as well when applied from the top down ... maybe better. We'll consider languages such as C and Pascal, and see how complete compilers can be built starting from the top. In the next installment, we'll apply the same technique to build a complete translator for a subset of the KISS language, which I'll be calling TINY. But one of my goals for this series is that you will not only be able to see how a compiler for TINY or KISS works, but that you will also be able to design and build compilers for your own languages. The C and Pascal examples will help. One thing I'd like you to see is that the natural structure of the compiler depends very much on the language being translated, so the simplicity and ease of construction of the compiler depends very much on letting the language set the program structure. It's a bit much to produce a full C or Pascal compiler here, and we won't try. But we can flesh out the top levels far enough so that you can see how it goes. Let's get started. THE TOP LEVEL One of the biggest mistakes people make in a top-down design is failing to start at the true top. They think they know what the overall structure of the design should be, so they go ahead and write it down. Whenever I start a new design, I always like to do it at the absolute beginning. In program design language (PDL), this top level looks something like: begin solve the problem end OK, I grant you that this doesn't give much of a hint as to what the next level is, but I like to write it down anyway, just to give me that warm feeling that I am indeed starting at the top. For our problem, the overall function of a compiler is to compile a complete program. Any definition of the language, written in BNF, begins here. What does the top level BNF look like? Well, that depends quite a bit on the language to be translated. Let's take a look at Pascal. THE STRUCTURE OF PASCAL Most texts for Pascal include a BNF or "railroad-track" definition of the language. Here are the first few lines of one: ::= '.' ::= PROGRAM ::= We can write recognizers to deal with each of these elements, just as we've done before. For each one, we'll use our familiar single-character tokens to represent the input, then flesh things out a little at a time. Let's begin with the first recognizer: the program itself. To translate this, we'll start with a fresh copy of the Cradle. Since we're back to single-character names, we'll just use a 'p' to stand for 'PROGRAM.' To a fresh copy of the cradle, add the following code, and insert a call to it from the main program: {--------------------------------------------------------------} { Parse and Translate A Program } procedure Prog; var Name: char; begin Match('p'); { Handles program header part } Name := GetName; Prolog(Name); Match('.'); Epilog(Name); end; {--------------------------------------------------------------} The procedures Prolog and Epilog perform whatever is required to let the program interface with the operating system, so that it can execute as a program. Needless to say, this part will be VERY OS-dependent. Remember, I've been emitting code for a 68000 running under the OS I use, which is SK*DOS. I realize most of you are using PC's and would rather see something else, but I'm in this thing too deep to change now! Anyhow, SK*DOS is a particularly easy OS to interface to. Here is the code for Prolog and Epilog: {--------------------------------------------------------------} { Write the Prolog } procedure Prolog; begin EmitLn('WARMST EQU $A01E'); end; {--------------------------------------------------------------} { Write the Epilog } procedure Epilog(Name: char); begin EmitLn('DC WARMST'); EmitLn('END ' + Name); end; {--------------------------------------------------------------} As usual, add this code and try out the "compiler." At this point, there is only one legal input: px. (where x is any single letter, the program name) Well, as usual our first effort is rather unimpressive, but by now I'm sure you know that things will get more interesting. There is one important thing to note: THE OUTPUT IS A WORKING, COMPLETE, AND EXECUTABLE PROGRAM (at least after it's assembled). This is very important. The nice feature of the top-down approach is that at any stage you can compile a subset of the complete language and get a program that will run on the target machine. From here on, then, we need only add features by fleshing out the language constructs. It's all very similar to what we've been doing all along, except that we're approaching it from the other end. FLESHING IT OUT To flesh out the compiler, we only have to deal with language features one by one. I like to start with a stub procedure that does nothing, then add detail in incremental fashion. Let's begin by processing a block, in accordance with its PDL above. We can do this in two stages. First, add the null procedure: {--------------------------------------------------------------} { Parse and Translate a Pascal Block } procedure DoBlock(Name: char); begin end; {--------------------------------------------------------------} and modify Prog to read: {--------------------------------------------------------------} { Parse and Translate A Program } procedure Prog; var Name: char; begin Match('p'); Name := GetName; Prolog; DoBlock(Name); Match('.'); Epilog(Name); end; {--------------------------------------------------------------} That certainly shouldn't change the behavior of the program, and it doesn't. But now the definition of Prog is complete, and we can proceed to flesh out DoBlock. That's done right from its BNF definition: {--------------------------------------------------------------} { Parse and Translate a Pascal Block } procedure DoBlock(Name: char); begin Declarations; PostLabel(Name); Statements; end; {--------------------------------------------------------------} The procedure PostLabel was defined in the installment on branches. Copy it into your cradle. I probably need to explain the reason for inserting the label where I have. It has to do with the operation of SK*DOS. Unlike some OS's, SK*DOS allows the entry point to the main program to be anywhere in the program. All you have to do is to give that point a name. The call to PostLabel puts that name just before the first executable statement in the main program. How does SK*DOS know which of the many labels is the entry point, you ask? It's the one that matches the END statement at the end of the program. OK, now we need stubs for the procedures Declarations and Statements. Make them null procedures as we did before. Does the program still run the same? Then we can move on to the next stage. DECLARATIONS The BNF for Pascal declarations is: ::= (