Attribute Grammar Rules for Intermediate Code Generation

This document is intended as a backup for the textbook and the lectures on IC generation. It presents an attribute grammar for generating basic IC components, such as while loops and conditionals, for lazy evaluation of the boolean operators && and ||, and includes the backpatching strategy for resolving forward references (instead of using two passes, as is typically done in an assembler).

Intermediate Code

Intermediate code (IC) is a language very close to assembly language in that it typically performs only one operation per instruction, however, it is different in that it uses variables instead of registers and we do not worry about whether operands are constants or variables (both issues are important in assembly languages). A real compiler would usually generate the actual machine code from the parse stack, but it is useful for us to separate out the various tasks, such as translating high-level language constructs into low-level statements, choosing appropriate instructions and machine idioms, optimization, and register allocation. Thus, our first task is to define an IC language and develop a translation schemes for C-language features such a control structures and boolean expressions.

Our intermediate code programs will consist of an array, called code[], of "quadruples" of strings (there are various implementations considered in your textbook, but this is the simplest):


   struct {

        char * op, * arg1, * arg2, * arg3;

   } code[SIZE]; 



Each row in the array is a single instruction. The instructions in our intermediate code language will be the following quadruples, which are designed to be as close to MIPS assembly language as possible:
Quaduple                         Meaning
--------                         -------
add arg1, arg2, arg3             arg1 = arg2 + arg3
add-fp arg1, arg2, arg3          arg1 = arg2 + arg3
sub arg1, arg2, arg3             arg1 = arg2 - arg3
sub-fp arg1, arg2, arg3          arg1 = arg2 - arg3
mul arg1, arg2, arg3             arg1 = arg2 * arg3
mul-fp arg1, arg2, arg3          arg1 = arg2 * arg3
div arg1, arg2, arg3             arg1 = arg2 / arg3
div-fp arg1, arg2, arg3          arg1 = arg2 / arg3
mod arg1, arg2, arg3             arg1 = arg2 % arg3
uminus arg1, -, arg3              arg1 = arg3 * -1
uminus-fp arg1, -, arg3           arg1 = arg3 * -1

and arg1, arg2, arg3             arg1 = arg2 && arg3
or  arg1, arg2, arg3             arg1 = arg2 || arg3
not arg1, -, arg3                arg1 = ! arg3
          
move arg1, -, arg3                arg1 = arg3
move-fp arg1, -, arg3             arg1 = arg3

int-to-fp arg1, -, arg3          coerce int arg3 to floating point arg1
fp-to-int arg1, -, arg3          coerce floating point arg3 to int arg1 by truncation

jump -, -, arg3                  Jump to code[arg3]
beq arg1, arg2, arg3             if ( arg1 == arg2 )
                                    Jump to code[arg3]
blt arg1, arg2, arg3             if ( arg1 < arg2 )
                                    Jump to code[arg3]

halt                             (end of program)

The arguments used as source operands (e.g., arg2 and arg3 in arithmetic statements, or arg1 and arg2 in branches) may be constants or variables. Arg3 in branches and jumps is the instruction address in the code array; for the present we shall assume this is a constant.

This selection is a minimal set of operations necessary to translate expressions and basic control structures; later we shall add instructions to access memory using base-offset addressing and instructions related to subroutine calls.

Attributes and Global Subroutines

In addition to the global array code[], we assume we have a function NewTemp() which returns a new temporary variable (i.e., one not occurring before), say from the list T0, T1, ... There is also a global procedure AddQuad(op, arg1, arg2, arg3) which inserts a quadruple (op, arg1, arg2, arg3) into the location code[nextQuad], and then increments the nextQuad pointer:
int nextQuad = 0;   

void AddQuad(char *operation, char * operand1, char * operand2, char * operand3 ) 
{
  code[nextQuad].op = operation;
  code[nextQuad].arg1 = operand1;
  code[nextQuad].arg2 = operand2;
  code[nextQuad].arg3 = operand3;
  ++nextQuad;
}
Our attribute grammar uses the following attributes. The meaning of these will be explained in detail below.
     E.place -- This stores the name of the variable in which the value
                of the expression E is stored;

     id.lexval, intnum.lexval, floatnum.lexval -- This is the lexeme for the given token

     S.breaklist, L.breaklist -- These record locations of
                jump quads used to implement  break statements; their
                target locations are not yet filled in. 

     E.truelist, E.falselist -- These are used to record backpatching
                locations of jumps out of boolean expressions on true and false
                (only used for lazy evaluation of booleans).
We will generate code on the stack of a shift-reduce parser which makes (as usual) one pass over the source file. Mostly this is a simple process, involving the generation of a particular template of IC code for each source language construct. The only complication has to do with forward references, i.e., jumps to locations that have not been determined yet. For example, when the condition in a loop evaluates to false, we must jump to the first statement after the loop, but we do not even know the size of the loop yet. Similarly, when executing a break statement, we must jump to the first statement following the loop. Most examples of forward references (e.g., our first example) are easy to solve by generating jumps without targets filled in, storing the location of such incomplete jumps on the stack, and then filling in the information later; since all the information is present at the level of a single rule, the only trick involved is to store the address in a dummy non-terminal. However, in some situations (e.g., the break), the rule generating the incomplete jump and the rule generating the jump target location are far apart, and we must pass lists of incomplete jumps around the tree and fill them in later on. This is called backpatching, since we go back and patch in the missing information. (The former situation, where the forward reference occurs in a single rule, we will call "local backpatching.") Another situation where backpatching is necessary is the lazy evaluation of boolean expressions. Although it sounds complicated, in fact it is only slightly more complex than the two-pass strategy used to solve the same problem in assemblers.

The process thus involves collecting jump addresses in linked lists and doing the actual backpatching. We use the following data structure and functions:

     struct ListNode {
            int addr;
            struct ListNode * next;
     }; 


     ListNode * Cons(int n, ListNode * L) = Return a new list with n added to front of L;
	 

     ListNode * Append(ListNode * list1, ListNode *list2) = Return a new list in which list2 has been added to the
                              end of list1. 
							  

     void BackPatch( ListNode * L, int k );
     {
        for( ; L != NULL; L = L->next )
           code[ L->addr ].arg3 = k;     // Fill in missing target in jump 
     }

Lists are NULL-terminated sequences of ListNodes; to create a list containing the single address 34, you would use: Cons(34, NULL).

IC Code Generation with Backpatching

We will rearrange the grammar in order to explain the attribute rules in the best order, and so the start rule (with start symbol Prog) will not occur until the end of this section. We will start with expressions, then proceed to assignments and flow of control statements.

Expressions

The grammar we use for arithmetic and boolean expressions will be ambiguous for simplicity, but will illustrate the principles involved in the translation. The translation of binary operators is achieved simply by translating each component expression in turn, and then applying the appropriate operator to the temporary variables holding the intermediate values:


Global variable: tempPlace // Holds variable name

E --> id                  E.place = id.lexval

E --> intnum              E.place = intnum.lexval   

E --> floatnum            E.place = floatnum.lexval 

// Two examples of translation of arithmetic operators, including coercions

E --> - E                 E.place = NewTemp()
                          if(IsInt(E1.type))
                            AddQuad("uminus", E.place, -, E1.place)
                          else
                            AddQuad("uminus-fp", E.place, -, E1.place)

E --> E + E               E.place = NewTemp()
                          if(IsInt(E1.type) && IsInt(E2.type))
                            AddQuad("plus", E.place, E1.place, E2.place)
                          else if (IsInt(E1.type) && IsFloat(E2.type)) {
                            tempPlace = NewTemp(); 
                            AddQuad("int-to-fp", tempPlace, -, E1.place)                            
                            AddQuad("plus-fp", E.place, tempPlace, E2.place)
                          }
                          else if (IsFloat(E1.type) && IsInt(E2.type)) {
                            tempPlace = NewTemp(); 
                            AddQuad("int-to-fp", tempPlace, -, E2.place)                            
                            AddQuad("plus-fp", E.place, E1.place, tempPlace)
                          } 
                          else 
                            AddQuad("plus-fp", E.place, E1.place, E2.place)

// Translation of Boolean expressions (non-lazy, see below for lazy evaluation) 

E -->   E && E            E.place = NewTemp()
                          AddQuad("and", E.place, E1.place, E2.place);

E -->   E == E            E.place = NewTemp()
                          AddQuad("beq", E1.place, E2.place, nextQuad + 3)   
                          AddQuad("move", E.place, -, 0)   
                          AddQuad("jump", -, -, nextQuad + 2)   
                          AddQuad("move", E.place, -, 1)   

E -->   E != E            E.place = NewTemp()
                          AddQuad("beq", E1.place, E2.place, nextQuad + 3)   
                          AddQuad("move", E.place, -, 1)   
                          AddQuad("jump", -, -, nextQuad + 2)   
                          AddQuad("move", E.place, -, 0)   

E -->   E < E             E.place = NewTemp()
                          AddQuad("blt", E1.place, E2.place, nextQuad + 3)   
                          AddQuad("move", E.place, -, 0)   
                          AddQuad("jump", -, -, nextQuad + 2)   
                          AddQuad("move", E.place, -, 1)   

E -->   E >= E            E.place = NewTemp()
                          AddQuad("blt", E1.place, E2.place, nextQuad + 3)   
                          AddQuad("move", E.place, -, 1)   
                          AddQuad("jump", -, -, nextQuad + 2)   
                          AddQuad("move", E.place, -, 0)   

E --> ( E )               E.place = E1.place

Note that boolean operators here are translated just as arithmetic operators are; later we shall consider another, more efficient implementation of booleans using only flow of control, and no assignment of actual values for the boolean expressions.

Statements

The primitive statements we allow are of two kinds: assignment statements and control statements (e.g., while). For assignment statements, since the expression on its RHS has already been translated during the reductions that produced E, we simply have to make an assignment from the variable where the value of E resides to the given identifier, keeping track of whether we need to coerce a floating point value to an integer:


S --> id = E              tempType = Lookup(id.lexval)
                          if(tempType == NULL)
                             Error("Undeclared identifier")
                          else if (IsInt(tempType) && IsFloat(E.type)) 
                            AddQuad("fp-to-int", id.lexval, -, E.place)                            
                          else if (IsFloat(tempType) && IsInt(E.type)) {
                            AddQuad("int-to-fp", id.lexval, -, E.place)                            
                          else if (IsInt(tempType))
                            AddQuad("move", id.lexval, -, E.place)
                          else 
                            AddQuad("move-fp", id.lexval, -, E.place)
                          S.breaklist = NULL 

Now we may translate the flow of control statements. In each of these, we can assume that the translation of the nonterminals on the right side has already taken place, and all we need to do is to connect up the pieces with the appropriate conditional and unconditional jumps. We also need to use local variables to store the location of unfilled jumps to do a little ``local backpatching''; as mentioned above, these may always be implemented as attributes of dummy non-terminals. We shall freely use dummy non-terminals, and the sequencing tricks mentioned above.


S --> if ( E ) Dum1          Dum1.temp = nextQuad;             // Location of next, unfilled jump 
                             AddQuad("beq", E.place, 0, -)

      S                      code[Dum1.temp].arg3 = nextQuad      // local backpatch! 
                             S.breaklist = S1.breaklist

Dum1 --> epsilon

S --> if ( E ) Dum2           Dum2.temp = nextQuad;                // location of next jump 
                              AddQuad(beq, E.place, 0, -)

      S                       code[Dum2.temp].arg3 = nextQuad+1           // local backpatch!  
                              Dum2.temp = nextQuad
                              AddQuad("jump", -, -, -)

      else S                  code[Dum2.temp].arg3 = nextQuad           // local backpatch!  
                              S.breaklist = Append(S1.breaklist, S2.breaklist)

S --> break                   S.breaklist = Cons(nextQuad, NULL)
                              AddQuad(jump, -, -, -)            // Target left blank! 
The only other flow of control statement we will consider is the while loop. This is very similar to the if-then, except that it contains a jump at the end back up to the starting location. Hence we need to store the starting location in a local variable.
S --> while Dum3       Dum3.temp = nextQuad;      // Location of start of E 
      ( E ) Dum4       Dum4.temp = nextQuad       // Location of unfilled jump 
                       AddQuad("beq", E.place, "0", -)
      S                AddQuad(jump, -,  -, Dum3.temp)
                       code[Dum4.temp].arg3 = nextQuad       // local backpatch! 
                       BackPatch(S1.breaklist, nextQuad)   // backpatch breaklists 
                       S.breaklist = NULL

Dum3 --> epsilon

Dum4 --> epsilon

Note that the breaklist is set to empty, since all the jumps have been filled in. This will cause the break to properly jump out of the closest enclosing while. A similar construction could be used for other kinds of loops, and to implement a continue (i.e., a jump out to the beginning of the loop) we would proceed almost identically to the above, but jump instead to the location where the entire translation of the loop started (i.e., Dum1.temp). The details are easy to fill in.

The rules for lists of statements just need to come in the right order for generating code for each statement, which they do naturally; delimiters such as left and right braces (and left and right parenthesis in expressions) actually do nothing during the translation process. Nothing actually needs to be done in this part of the grammar, except to pass the breaklists (lists of unfilled jumps) up the tree:

S --> { L }            S.breaklist = L.breaklist

L --> S                L.breaklist = S.breaklist

L -->  S ; L           L.breaklist = Append(S.breaklist, L1.breaklist)

The start production (coming as the last reduction) does nothing except add the last statement of the IC program:
P -->  main( ) S         AddQuad( halt, -, -, - )

The rules for declarations (`` D'') generate no code, and so we shall leave them out, as they have been dealt with in another set of notes.

Lazy Evaluation of Boolean Expressions

We will now consider an alternate method for translating boolean expressions by manipulating the flow of control through a region of code; this allows us to implement the lazy evaluation of boolean expression as found in C and related languages. The basis for this technique is the observation that most boolean expressions are used to affect the flow of control through a loop or conditional, by jumping to one location when the expression is true, and another when it is false. But then, if we find out before finishing execution of the expression that it must be true, we could be lazy and jump immediately to the "true location" (say, the "then" part of a conditional), and similarly for an evaluation of false. The recursive application of this idea throughout a boolean expression results in an implementation where boolean expressions are implemented almost entirely by jumps, and we never evaluate more of an expression than is strictly necessary to determine its final value--although values per se might never be actually calculated! This technique requires the constant use of backpatching, since we do not know ahead of time where to jump on true and false evaluations.

One complication is that expressions may be "mixed mode," that is, boolean and arithmetic operators can be mixed arbitrarily in an expression, for example:

    A  =  ( B < C+1 || D ) * 3; 

Unfortunately, C allows mixed mode expressions AND specifies that boolean operators MUST be evaluated lazily, so we have to find a way to combine arithmetic evaluation (computing values) with lazy evaluation (altering flow of control).

Let us defer this complication for a moment, and first first explain the general technique for implementing lazy evaluation of booleans under the assumption that a boolean operator never occurs beneath an arithmetic operator, and never on the right side of an assignment (e.g., they could occur in conditions of loops, or in conditionals).

Attribute Grammar for Lazy Evaluation of Simple Boolean Expressions


 E -->   E == E         E.truelist = Cons(nextQuad, NULL)
                        AddQuad("beq", E1.place, E2.place, -)
                        E.falselist = Cons(nextQuad, NULL)
                        AddQuad("jump", -, -, -)

 E -->   E != E         E.falselist = Cons(nextQuad, NULL)
                        AddQuad("beq", E1.place, E2.place, -)
                        E.truelist = Cons(nextQuad, NULL)
                        AddQuad("jump", -, -, -)

 E -->   E < E          E.truelist = Cons(nextQuad, NULL)
                        AddQuad("blt", E1.place, E2.place, -)
                        E.falselist = Cons(nextQuad, NULL)
                        AddQuad("jump", -, -, -)

 E -->   E > E          E.truelist = Cons(nextQuad, NULL)
                        AddQuad("blt", E2.place, E1.place, -)
                        E.falselist = Cons(nextQuad, NULL)
                        AddQuad("jump", -, -, -)

 E -->   E <= E         E.falselist = Cons(nextQuad, NULL)
                        AddQuad("blt", E2.place, E1.place, -)
                        E.truelist = Cons(nextQuad, NULL)
                        AddQuad("jump", -, -, -)

 E -->   E >= E         E.falselist = Cons(nextQuad, NULL)
                        AddQuad("blt", E1.place, E2.place, -)
                        E.truelist = Cons(nextQuad, NULL)
                        AddQuad("jump", -, -, -)


 E -->  ! E             E.truelist = E1.falselist
                        E.falselist = E1.truelist


 E --> E                BackPatch(E1.truelist, nextQuad)
       && E             E.truelist = E2.truelist
                        E.falselist = append(E1.falselist, E2.falselist)

 E --> E                BackPatch(E1.falselist, nextQuad)
       || E             E.falselist = E2.falselist
                        E.truelist = append(E1.truelist, E2.truelist)

We now consider how simple boolean expressions may be used in conditionals and the while loop. The rules for the attribute breaklist are unchanged. At the end of the translation of a boolean expression E, the attribute E.truelist contains a list of all jumps to the location where the control should branch to if E is true; E.falselist contains a list of jumps out upon false evaluation. We now see how these locations are filled in.

S --> if ( E )         BackPatch(E.truelist, nextQuad)
      S                BackPatch(E.falselist, nextQuad)
                       S.breaklist = S1.breaklist


S --> if ( E )         BackPatch(E.truelist, nextQuad)
      S Dum            Dum.temp = nextQuad
                       AddQuad("jump", -, -, -)
                       BackPatch(E.falselist, nextQuad)
      else S           code[Dum.temp].arg3 = nextQuad           '' local backpatch!  
                       S.breaklist = Append(S1.breaklist, S2.breaklist)

Dum --> epsilon

Again, the while loop is similar to the if-then, with a jump back to the beginning:

S --> while Dum        Dum.temp = nextQuad;      
      ( E )            BackPatch(E.truelist, nextQuad)
      S                AddQuad("jump", -,  -, Dum.temp)
                       BackPatch(E.falselist, nextQuad)   
                       BackPatch(S1.breaklist, nextQuad)   
                       S.breaklist = NULL

Dum --> epsilon
Finally, we must consider how to integrate these techniques into mixed-mode expressions. The basic idea is to use the E.truelist attribute as a flag to indicate what kind of mode is being used (i.e., E.truelist is always non-NULL for boolean expressions, and we can always set it to NULL for arithmetic expressions). Then we can extend the previous grammar so that (i) when making a transition from flow-of-control (FOC) mode to arithmetic (A) mode, the values 0 and 1 are created and passed along, (ii) when going from arithmetic to flow-of-control mode, we use the calculated values to control the jumps, and (iii) statements using expressions must test which mode the expression is in and translate accordingly (expressions used as conditions must be flow of control, but when used in assignments must be in arithmetic mode). We will show how this works with one arithmetic operator (unary minus), one boolean operator (!), and two statement types (while and assignment) to illustrate the principal cases; the reader can easily extend the idea to all other uses.

E --> - E                 E.place = NewTemp()
                          E.truelist = NULL          // Must set flag to indicate E is A-mode
                          if(E1.truelist != NULL) {  // E1 is FOC-mode; gen code to make transition
                E.place = NewTemp(); 
                            BackPatch(E1.falselist, nextQuad)   
                            AddQuad("move", E.place, -, 0)   
                            AddQuad("jump", -, -, nextQuad+2)   
                            BackPatch(E1.truelist, nextQuad)   
                            AddQuad("move", E.place, -, 1)   
; 
                          }
                          if(IsInt(E1.type))        // Note: booleans have int type
                            AddQuad("uminus", E.place, -, E1.place)
                          else 
                            AddQuad("uminus-fp", E.place, -, E1.place)

E --> ! E              if(E1.truelist == NULL) {  // E1 is A-mode; gen code to make transition   
                                     E1.falselist = Cons(nextQuad,NULL); 
                            AddQuad("beq", E1.place, 0, -)   
                            E1.truelist = Cons(nextQuad,NULL); 
                            AddQuad("jump", -, -, -)   
                          }
                          E.truelist = E1.falselist     // Now implement FOC negation as usual
                          E.falselist = E1.truelist

S --> while Dum           Dum.temp = nextQuad      

      ( E )               if(E.truelist == NULL) {    // E is A mode; gen code to make transition
                            E.falselist = Cons(nextQuad, NULL)   // Use E.falselist to store unfilled branch
                            AddQuad("beq", E.place, 0, -)
                          }
                          else    // E was flow of control mode
                            BackPatch(E.truelist, nextQuad)

      S                   AddQuad("jump", -, -, Dum.temp)
                          BackPatch(E.falselist, nextQuad)   
                          BackPatch(S1.breaklist, nextQuad)   
                          S.breaklist = NULL

Dum --> epsilon


S --> id = E              if(E1.truelist != NULL) {  // E1 is FOC-mode; gen code to produce value
                            E.place = NewTemp(); 
                            BackPatch(E.falselist, nextQuad)   
                            AddQuad("move", E.place, -, 0)   
                            AddQuad("jump", -, -, nextQuad+2)    
                            BackPatch(E.truelist, nextQuad)   
                            AddQuad("move", E.place, -, 1)   
                          }


                          tempType = Lookup(id.lexval)         // Now translate assignments as usual
                          
                          .... etc. as above ......