This document is intended as a backup for the textbook and the lectures on IC generation. It presents an attribute grammar for generating basic IC components, such as while loops and conditionals, for lazy evaluation of the boolean operators && and ||, and includes the backpatching strategy for resolving forward references (instead of using two passes, as is typically done in an assembler).
Our intermediate code programs will consist of an array, called code[], of "quadruples" of strings (there are various implementations considered in your textbook, but this is the simplest):
struct {
char * op, * arg1, * arg2, * arg3;
} code[SIZE];
Each row in the array is a single instruction. The instructions in our intermediate
code language will be the following quadruples, which are designed to be as close
to MIPS assembly language as possible:
Quaduple Meaning
-------- -------
add arg1, arg2, arg3 arg1 = arg2 + arg3
add-fp arg1, arg2, arg3 arg1 = arg2 + arg3
sub arg1, arg2, arg3 arg1 = arg2 - arg3
sub-fp arg1, arg2, arg3 arg1 = arg2 - arg3
mul arg1, arg2, arg3 arg1 = arg2 * arg3
mul-fp arg1, arg2, arg3 arg1 = arg2 * arg3
div arg1, arg2, arg3 arg1 = arg2 / arg3
div-fp arg1, arg2, arg3 arg1 = arg2 / arg3
mod arg1, arg2, arg3 arg1 = arg2 % arg3
uminus arg1, -, arg3 arg1 = arg3 * -1
uminus-fp arg1, -, arg3 arg1 = arg3 * -1
and arg1, arg2, arg3 arg1 = arg2 && arg3
or arg1, arg2, arg3 arg1 = arg2 || arg3
not arg1, -, arg3 arg1 = ! arg3
move arg1, -, arg3 arg1 = arg3
move-fp arg1, -, arg3 arg1 = arg3
int-to-fp arg1, -, arg3 coerce int arg3 to floating point arg1
fp-to-int arg1, -, arg3 coerce floating point arg3 to int arg1 by truncation
jump -, -, arg3 Jump to code[arg3]
beq arg1, arg2, arg3 if ( arg1 == arg2 )
Jump to code[arg3]
blt arg1, arg2, arg3 if ( arg1 < arg2 )
Jump to code[arg3]
halt (end of program)
The arguments used as source operands (e.g., arg2 and arg3 in arithmetic statements, or arg1 and arg2 in branches) may be constants or variables. Arg3 in branches and jumps is the instruction address in the code array; for the present we shall assume this is a constant.
This selection is a minimal set of operations necessary to translate expressions and basic control structures; later we shall add instructions to access memory using base-offset addressing and instructions related to subroutine calls.
int nextQuad = 0;
void AddQuad(char *operation, char * operand1, char * operand2, char * operand3 )
{
code[nextQuad].op = operation;
code[nextQuad].arg1 = operand1;
code[nextQuad].arg2 = operand2;
code[nextQuad].arg3 = operand3;
++nextQuad;
}
Our attribute grammar uses the following attributes. The meaning of these will
be explained in detail below.
E.place -- This stores the name of the variable in which the value
of the expression E is stored;
id.lexval, intnum.lexval, floatnum.lexval -- This is the lexeme for the given token
S.breaklist, L.breaklist -- These record locations of
jump quads used to implement break statements; their
target locations are not yet filled in.
E.truelist, E.falselist -- These are used to record backpatching
locations of jumps out of boolean expressions on true and false
(only used for lazy evaluation of booleans).
We will generate code on the stack of a shift-reduce parser which makes (as usual)
one pass over the source file. Mostly this is a simple process, involving the
generation of a particular template of IC code for each source language construct.
The only complication has to do with forward references, i.e., jumps to locations
that have not been determined yet. For example, when the condition in a loop evaluates
to false, we must jump to the first statement after the loop, but we do not even
know the size of the loop yet. Similarly, when executing a break statement, we
must jump to the first statement following the loop. Most examples of forward
references (e.g., our first example) are easy to solve by generating jumps without
targets filled in, storing the location of such incomplete jumps on the stack,
and then filling in the information later; since all the information is present
at the level of a single rule, the only trick involved is to store the address
in a dummy non-terminal. However, in some situations (e.g., the break), the rule
generating the incomplete jump and the rule generating the jump target location
are far apart, and we must pass lists of incomplete jumps around the tree and
fill them in later on. This is called backpatching, since we go back and
patch in the missing information. (The former situation, where the forward
reference occurs in a single rule, we will call "local backpatching.") Another
situation where backpatching is necessary is the lazy evaluation of boolean expressions.
Although it sounds complicated, in fact it is only slightly more complex than
the two-pass strategy used to solve the same problem in assemblers.
The process thus involves collecting jump addresses in linked lists and doing the actual backpatching. We use the following data structure and functions:
struct ListNode {
int addr;
struct ListNode * next;
};
ListNode * Cons(int n, ListNode * L) = Return a new list with n added to front of L;
ListNode * Append(ListNode * list1, ListNode *list2) = Return a new list in which list2 has been added to the
end of list1.
void BackPatch( ListNode * L, int k );
{
for( ; L != NULL; L = L->next )
code[ L->addr ].arg3 = k; // Fill in missing target in jump
}
Lists are NULL-terminated sequences of ListNodes; to create a list containing
the single address 34, you would use: Cons(34, NULL).
The grammar we use for arithmetic and boolean expressions will be ambiguous for simplicity, but will illustrate the principles involved in the translation. The translation of binary operators is achieved simply by translating each component expression in turn, and then applying the appropriate operator to the temporary variables holding the intermediate values:
Global variable: tempPlace // Holds variable name
E --> id E.place = id.lexval
E --> intnum E.place = intnum.lexval
E --> floatnum E.place = floatnum.lexval
// Two examples of translation of arithmetic operators, including coercions
E --> - E E.place = NewTemp()
if(IsInt(E1.type))
AddQuad("uminus", E.place, -, E1.place)
else
AddQuad("uminus-fp", E.place, -, E1.place)
E --> E + E E.place = NewTemp()
if(IsInt(E1.type) && IsInt(E2.type))
AddQuad("plus", E.place, E1.place, E2.place)
else if (IsInt(E1.type) && IsFloat(E2.type)) {
tempPlace = NewTemp();
AddQuad("int-to-fp", tempPlace, -, E1.place)
AddQuad("plus-fp", E.place, tempPlace, E2.place)
}
else if (IsFloat(E1.type) && IsInt(E2.type)) {
tempPlace = NewTemp();
AddQuad("int-to-fp", tempPlace, -, E2.place)
AddQuad("plus-fp", E.place, E1.place, tempPlace)
}
else
AddQuad("plus-fp", E.place, E1.place, E2.place)
// Translation of Boolean expressions (non-lazy, see below for lazy evaluation)
E --> E && E E.place = NewTemp()
AddQuad("and", E.place, E1.place, E2.place);
E --> E == E E.place = NewTemp()
AddQuad("beq", E1.place, E2.place, nextQuad + 3)
AddQuad("move", E.place, -, 0)
AddQuad("jump", -, -, nextQuad + 2)
AddQuad("move", E.place, -, 1)
E --> E != E E.place = NewTemp()
AddQuad("beq", E1.place, E2.place, nextQuad + 3)
AddQuad("move", E.place, -, 1)
AddQuad("jump", -, -, nextQuad + 2)
AddQuad("move", E.place, -, 0)
E --> E < E E.place = NewTemp()
AddQuad("blt", E1.place, E2.place, nextQuad + 3)
AddQuad("move", E.place, -, 0)
AddQuad("jump", -, -, nextQuad + 2)
AddQuad("move", E.place, -, 1)
E --> E >= E E.place = NewTemp()
AddQuad("blt", E1.place, E2.place, nextQuad + 3)
AddQuad("move", E.place, -, 1)
AddQuad("jump", -, -, nextQuad + 2)
AddQuad("move", E.place, -, 0)
E --> ( E ) E.place = E1.place
Note that boolean operators here are translated just as arithmetic operators are;
later we shall consider another, more efficient implementation of booleans using only flow of control,
and no assignment of actual values for the boolean expressions.
The primitive statements we allow are of two kinds: assignment statements and control statements (e.g., while). For assignment statements, since the expression on its RHS has already been translated during the reductions that produced E, we simply have to make an assignment from the variable where the value of E resides to the given identifier, keeping track of whether we need to coerce a floating point value to an integer:
S --> id = E tempType = Lookup(id.lexval)
if(tempType == NULL)
Error("Undeclared identifier")
else if (IsInt(tempType) && IsFloat(E.type))
AddQuad("fp-to-int", id.lexval, -, E.place)
else if (IsFloat(tempType) && IsInt(E.type)) {
AddQuad("int-to-fp", id.lexval, -, E.place)
else if (IsInt(tempType))
AddQuad("move", id.lexval, -, E.place)
else
AddQuad("move-fp", id.lexval, -, E.place)
S.breaklist = NULL
Now we may translate the flow of control statements. In each of these, we can assume that the translation of the nonterminals on the right side has already taken place, and all we need to do is to connect up the pieces with the appropriate conditional and unconditional jumps. We also need to use local variables to store the location of unfilled jumps to do a little ``local backpatching''; as mentioned above, these may always be implemented as attributes of dummy non-terminals. We shall freely use dummy non-terminals, and the sequencing tricks mentioned above.
S --> if ( E ) Dum1 Dum1.temp = nextQuad; // Location of next, unfilled jump
AddQuad("beq", E.place, 0, -)
S code[Dum1.temp].arg3 = nextQuad // local backpatch!
S.breaklist = S1.breaklist
Dum1 --> epsilon
S --> if ( E ) Dum2 Dum2.temp = nextQuad; // location of next jump
AddQuad(beq, E.place, 0, -)
S code[Dum2.temp].arg3 = nextQuad+1 // local backpatch!
Dum2.temp = nextQuad
AddQuad("jump", -, -, -)
else S code[Dum2.temp].arg3 = nextQuad // local backpatch!
S.breaklist = Append(S1.breaklist, S2.breaklist)
S --> break S.breaklist = Cons(nextQuad, NULL)
AddQuad(jump, -, -, -) // Target left blank!
The only other flow of control statement we will consider is
the while loop. This is very similar to the
if-then, except that it contains a jump at the end back
up to the starting location. Hence we need to store the starting
location in a local variable.
S --> while Dum3 Dum3.temp = nextQuad; // Location of start of E
( E ) Dum4 Dum4.temp = nextQuad // Location of unfilled jump
AddQuad("beq", E.place, "0", -)
S AddQuad(jump, -, -, Dum3.temp)
code[Dum4.temp].arg3 = nextQuad // local backpatch!
BackPatch(S1.breaklist, nextQuad) // backpatch breaklists
S.breaklist = NULL
Dum3 --> epsilon
Dum4 --> epsilon
Note that the breaklist is set to empty, since
all the jumps have been filled in. This will cause the break
to properly jump out of the closest enclosing while. A
similar construction could be used for other kinds of loops, and
to implement a continue (i.e., a jump out to the beginning
of the loop) we would proceed almost identically to the above, but
jump instead to the location where the entire translation of the loop started
(i.e., Dum1.temp). The details are easy to fill in.
The rules for lists of statements just need to come in the right order for generating code for each statement, which they do naturally; delimiters such as left and right braces (and left and right parenthesis in expressions) actually do nothing during the translation process. Nothing actually needs to be done in this part of the grammar, except to pass the breaklists (lists of unfilled jumps) up the tree:
S --> { L } S.breaklist = L.breaklist
L --> S L.breaklist = S.breaklist
L --> S ; L L.breaklist = Append(S.breaklist, L1.breaklist)
The start production (coming as the last reduction) does nothing except add the
last statement of the IC program:
P --> main( ) S AddQuad( halt, -, -, - )The rules for declarations (`` D'') generate no code, and so we shall leave them out, as they have been dealt with in another set of notes.
One complication is that expressions may be "mixed mode," that is, boolean and arithmetic operators can be mixed arbitrarily in an expression, for example:
A = ( B < C+1 || D ) * 3;
Unfortunately, C allows mixed mode expressions AND specifies that boolean operators MUST be evaluated lazily, so we have to find a way to combine arithmetic evaluation (computing values) with lazy evaluation (altering flow of control).
Let us defer this complication for a moment, and first first explain the general technique for implementing lazy evaluation of booleans under the assumption that a boolean operator never occurs beneath an arithmetic operator, and never on the right side of an assignment (e.g., they could occur in conditions of loops, or in conditionals).
E --> E == E E.truelist = Cons(nextQuad, NULL)
AddQuad("beq", E1.place, E2.place, -)
E.falselist = Cons(nextQuad, NULL)
AddQuad("jump", -, -, -)
E --> E != E E.falselist = Cons(nextQuad, NULL)
AddQuad("beq", E1.place, E2.place, -)
E.truelist = Cons(nextQuad, NULL)
AddQuad("jump", -, -, -)
E --> E < E E.truelist = Cons(nextQuad, NULL)
AddQuad("blt", E1.place, E2.place, -)
E.falselist = Cons(nextQuad, NULL)
AddQuad("jump", -, -, -)
E --> E > E E.truelist = Cons(nextQuad, NULL)
AddQuad("blt", E2.place, E1.place, -)
E.falselist = Cons(nextQuad, NULL)
AddQuad("jump", -, -, -)
E --> E <= E E.falselist = Cons(nextQuad, NULL)
AddQuad("blt", E2.place, E1.place, -)
E.truelist = Cons(nextQuad, NULL)
AddQuad("jump", -, -, -)
E --> E >= E E.falselist = Cons(nextQuad, NULL)
AddQuad("blt", E1.place, E2.place, -)
E.truelist = Cons(nextQuad, NULL)
AddQuad("jump", -, -, -)
E --> ! E E.truelist = E1.falselist
E.falselist = E1.truelist
E --> E BackPatch(E1.truelist, nextQuad)
&& E E.truelist = E2.truelist
E.falselist = append(E1.falselist, E2.falselist)
E --> E BackPatch(E1.falselist, nextQuad)
|| E E.falselist = E2.falselist
E.truelist = append(E1.truelist, E2.truelist)
We now consider how simple boolean expressions may be used in conditionals and the
while loop. The rules for the attribute breaklist are unchanged. At the
end of the translation of a boolean expression E, the attribute E.truelist
contains a list of all jumps to the location where the control should branch to
if E is true; E.falselist contains a list of jumps out upon false evaluation.
We now see how these locations are filled in.
S --> if ( E ) BackPatch(E.truelist, nextQuad)
S BackPatch(E.falselist, nextQuad)
S.breaklist = S1.breaklist
S --> if ( E ) BackPatch(E.truelist, nextQuad)
S Dum Dum.temp = nextQuad
AddQuad("jump", -, -, -)
BackPatch(E.falselist, nextQuad)
else S code[Dum.temp].arg3 = nextQuad '' local backpatch!
S.breaklist = Append(S1.breaklist, S2.breaklist)
Dum --> epsilon
Again, the while loop is similar to the if-then, with a jump back to the
beginning:
S --> while Dum Dum.temp = nextQuad;
( E ) BackPatch(E.truelist, nextQuad)
S AddQuad("jump", -, -, Dum.temp)
BackPatch(E.falselist, nextQuad)
BackPatch(S1.breaklist, nextQuad)
S.breaklist = NULL
Dum --> epsilon
Finally, we must consider how to integrate these techniques into mixed-mode expressions.
The basic idea is to use the E.truelist attribute as a flag to indicate what kind
of mode is being used (i.e., E.truelist is always non-NULL for boolean expressions,
and we can always set it to NULL for arithmetic expressions). Then we can extend
the previous grammar so that (i) when making a transition from flow-of-control
(FOC) mode to arithmetic (A) mode, the values 0 and 1 are created and passed along,
(ii) when going from arithmetic to flow-of-control mode, we use the calculated
values to control the jumps, and (iii) statements using expressions must test
which mode the expression is in and translate accordingly (expressions used as
conditions must be flow of control, but when used in assignments must be in arithmetic
mode). We will show how this works with one arithmetic operator (unary minus),
one boolean operator (!), and two statement types (while and assignment) to illustrate
the principal cases; the reader can easily extend the idea to all other uses.
E --> - E E.place = NewTemp()
E.truelist = NULL // Must set flag to indicate E is A-mode
if(E1.truelist != NULL) { // E1 is FOC-mode; gen code to make transition
E.place = NewTemp();
BackPatch(E1.falselist, nextQuad)
AddQuad("move", E.place, -, 0)
AddQuad("jump", -, -, nextQuad+2 )
BackPatch(E1.truelist, nextQuad)
AddQuad("move", E.place, -, 1)
;
}
if(IsInt(E1.type)) // Note: booleans have int type
AddQuad("uminus", E.place, -, E1.place)
else
AddQuad("uminus-fp", E.place, -, E1.place)
E --> ! E if(E1.truelist == NULL) { // E1 is A-mode; gen code to make transition
E1.falselist = Cons(nextQuad,NULL);
AddQuad("beq", E1.place, 0, -)
E1.truelist = Cons(nextQuad,NULL);
AddQuad("jump", -, -, -)
}
E.truelist = E1.falselist // Now implement FOC negation as usual
E.falselist = E1.truelist
S --> while Dum Dum.temp = nextQuad
( E ) if(E.truelist == NULL) { // E is A mode; gen code to make transition
E.falselist = Cons(nextQuad, NULL) // Use E.falselist to store unfilled branch
AddQuad("beq", E.place, 0, -)
}
else // E was flow of control mode
BackPatch(E.truelist, nextQuad)
S AddQuad("jump", -, -, Dum.temp)
BackPatch(E.falselist, nextQuad)
BackPatch(S1.breaklist, nextQuad)
S.breaklist = NULL
Dum --> epsilon
S --> id = E if(E1.truelist != NULL) { // E1 is FOC-mode; gen code to produce value
E.place = NewTemp();
BackPatch(E.falselist, nextQuad)
AddQuad("move", E.place, -, 0)
AddQuad("jump", -, -, nextQuad+2)
BackPatch(E.truelist, nextQuad)
AddQuad("move", E.place, -, 1)
}
tempType = Lookup(id.lexval) // Now translate assignments as usual
.... etc. as above ......