This document is intended as a backup to the lectures on the implementation of function calls in C. After discussing in general the minimal functionality necessary to support function calls in our compiler, we present a set of attribute rules which generate the appropriate IC code statements.
We shall present a simple view of code generation for functions based on the uniform width of 4 bytes for all parameters and for the return value of a function; the general idea presented here can be easily modified for the more complex case.
int A;
int fact(int x) { // This is the Callee
int t;
if(x<2)
return(1);
else {
t = x * fact(x-1);
return(t);
}
}
main(){
.
.
.
A = 4;
B = fact(A+1); // This is the Caller
.
.
.
}
What is necessary in order to generate IC code to translate this
program? Basically, there are two issues: control flow and data access.
It should be obvious to you by now that our compiler generates code as it encounters the instructions in the source file; thus, the code for each function definition will be added to the IC code array as it is parsed and translated. The IC code array will thus consist of separate regions of code, one for each function in the source program. In order to call a function, the caller must jump to the starting location of the appropriate region of IC code where the callee lives, and when the function call is ended, a jump must be made back to the instruction following the jump in R.
The way this is typically implemented in assembly language is that the jump to a function also saves the address of the following instruction (i.e., the program counter (PC) plus 4) in a special register (in MIPS, this is called $ra) and then the return can be effected by jumping to the location in this register. For now, we will assume that we have a special memory location called (RA) for storing this value. We will thus add two instructions to our IC code language:
| |
$sp ->|---------------|
| SF for fact |
| |
|---------------|
| SF for fact |
| |
|---------------|
| SF for main |
| |
|---------------| FFFF FFFF (last byte in 32bit VM space)
All data in the stack frame must be referenced as an offset from the stack pointer. Thus there are two kinds of data which a function may refer to: global data which is declared in file scope, laid out on the lowest ST on the stack, and referred to by name ("A", "B[4]"); and local data which is declared as a parameter or local variable in the function, laid out (generally) on a ST which is just above the lowest level on the stack of symbol tables, and referred to as offsets from the stack pointer ("$sp[0]", "$sp[4]"). Therefore, it will be necessary in generating variable references to have some way of determining which category a variable belongs to. We will assume in what follows that we have available to us the following function:
IsStatic(id.lexval) -- Returns 1 if, when looking up the name id.lexval, it is found on the lowest-level ST on the stack; returns 0 if the variable is not found, or if it is found on ST not on the lowest level.
This must be integrated into the Data Layout rules presented in a previous set of notes, and will be explained further below.
(sub $sp, $sp, 16)
which has the effect of adding the frame on top:
| |
$sp ->|---------------|
| SF for fact |
| |
|---------------|
| SF for fact |
| |
|---------------|
| SF for fact |
| |
|---------------|
| SF for main |
| |
|---------------| FFFF FFFF (last byte in 32bit VM space)
When the call is ended, we would pop the frame by executing:
(add $sp, $sp, 16)
As mentioned above, ALL access to parameters and local data for a function is
done by taking an offset from the stack pointer. This can be before the call is
made (a negative offset into the frame about to be pushed) or during the call
(a positive offset into the top frame). This is how caller and callee communicate
parameters and return value. For now we do not consider access to any other frames.
The stack contains the following information:
$sp-> |---------------|+ |---------------|
| Parameters || |Frame defined |
|---------------|| <= same as => |by offsets in |
| Temporaries || |ST for function|
|---------------|+ |---------------|
| Return Address|
|---------------|
| Return Value |
|---------------|
The return value for our compiler is always 4 bytes long, and so is the return
address; the length of the other two fields may vary. Since the parameters and
temporaries are always in a single ST, in which the offsets for each item, and
the global length and alignment, have already been calculated, we will simply
use this frame as the shape of the top two fields of the stack frame, and add
the RA and the RV below.
In more general settings than our project, the alignment of a stack frame is a delicate issue, since we can not predict very well what frames will be stacked on top of each other, this being due to the dynamic behavior of the program. Therefore it is hard to know the alignment restriction that must be placed on the position of the stack pointer. For this reason, it is usual to observe the worst-case alignment when moving the stack pointer. For many machines, this would mean the stack pointer is always double-word aligned, but for our project, word alignment suffices. We will use word alignment in what follows.
In the example which started this document, we would have the following ST:
Name Type Offset ---- ---- ------ x Int 0 t Int 4 ---------------------- Alignment: 4 Length: 8The return value and return address would both be 4 bytes long, which gives us a 16 byte frame:
$sp-> |---------------|
| x |
|---------------|
| t |
|---------------|
| Return Address|
|---------------|
| Return Value |
|---------------|
If we take account of the position before and after
the call, we have the following offsets:
| | | |
|---------------| call $sp-->|---------------|
| x | (sub $sp, $sp, 16) | x |
|---------------| ===> |---------------|
| t | | t |
|---------------| |---------------|
| Return Address| return | Return Address|
|---------------| (add $sp, $sp, 16) |---------------|
| Return Value | <=== | Return Value |
$sp-->|---------------| |---------------|
| | | |
| | | |
| | | |
| | | |
Variable Before & after call During call
-------- ------------------- -----------
x $sp[-16] $sp[0]
t $sp[-12] $sp[4]
RA $sp[-8] $sp[8]
RV $sp[-4] $sp[12]
In general, if we assume that RA and RV are both 4 bytes long, and if a particular function f
has a fixed stack frame size of S, then these references to RA, RV, or an arbitrary parameter
or local variable X with offset(X) inside the ST for f can be expressed as:
Variable Before & after call During call -------- ------------------- -------------- X $sp[offset(X)-S] $sp[offset(X)] RA $sp[-8] $sp[S-8] RV $sp[-4] $sp[S-4]
In fact, we will always do the call before pushing the stack frame, and the return after popping the frame, so RA can always be found at $sp[-8].
| |
|-----------------|
| Eval e0 and | Caller: use of expression f(e0, e1, ..., en)
| put in $sp[-S] | S is frame size for f
|-----------------|
| Eval e1 and |
| put in $sp[4-S]|
|-----------------|
| Eval rest of |
| ei and put in |
| $sp[4*i-S] |
|-----------------|
| call, -, -, L | Jump to start of callee and save RA in $sp[-8]
|-----------------|
RA:| move T, $sp[-4] | Move RV to some variable and use it as value of expression
|-----------------|
| . |
.
.
|-----------------|
L:| sub $sp,$sp, S | Callee
|-----------------|
| Code for callee |
| Reference to |
| local var/param |
| X is translated |
| as $sp[off(X)]. |
| Reference to |
| static A is just|
| A. |
| |
|-----------------|+
| E evaluated and || When return(E) is encountered,
| value placed in || put value of E on stack
| $sp[S-4] || and then do return sequence.
|-----------------||
| add $sp,$sp, S ||
|-----------------||
| return ||
|-----------------|+
| |
| |
| |
|-----------------|
| add $sp,$sp, S | At end of code for function, do
|-----------------| return sequence anyway (even if value
| return | has not be calculated)
|-----------------| Return will jump to location in $sp[-8]
| |
D -> T id "(" PushST();
tempName = id.lexval;
L ")" "{" Ds AddFunctionEntry(id.lexval, Append(T.type, L.type), nextQuad,
(Length(tos) + 8));
AQ("sub", "$sp", "$sp", (Length(tos) + 8));
Ss "}" AQ("add", "$sp", "$sp", (Length(tos) + 8));
AQ("return", "-", "-", "-");
Pop();
L -> T id L.type = T.type;
L -> L "," T id L.type = Append(L1.type, T.type);
S -> "return" "(" E ")"
AQ("move", "$sp[" + itoa(LookupFrameSize(tempName)-4) + "]", "-", $3.place);
AQ("add", "$sp", "$sp", itoa(LookupFrameSize(tempName)));
AQ("return", "-", "-", "-");
E -> id "(" Es ")"
AQ("call", "-", "-", itoa(LookupStart(id.lexval)));
E.place = NewTemp();
AQ("move", E.place, "-", "$sp[-4]");
Es -> E
E.offset = 0;
E.framesize = LookupFrameSize($-1.lexval); // Go down stack to grab name of function!
AQ("move", "$sp[" + itoa(-(E.framesize)) + "]", "-", $1.place);
Es -> Es "," E
Es.offset = Es1.offset + 4; // For simplicity, all types have size 4
Es.framesize = Es1.framesize;
AQ("move", "$sp[" + itoa(E.offset-Es.framesize) + "]", "-", E.place);
E -> id
if(IsStatic(id.lexval) // This is how to start the placename for
E.place = id.lexval; // a variable reference
else
E.place = "$sp";
NOTES:
Intermediate Code Program: 0: (sub, $sp, $sp, 16) 1: (move, t1, -, 0) 2: (blt, $sp[t1], 2, 4) 3: (jump, -, -, 8) 4: (move, $sp[12], -, 1) 5: (add, $sp, $sp, 16) 6: (return, -, -, -) 7: (jump, -, -, 21) 8: (move, t2, -, 4) 9: (move, t3, -, 0) 10: (move, t4, -, 0) 11: (sub, t5, $sp[t4], 1) 12: (move, $sp[-16], -, t5) 13: (call, -, -, 0) # fact 14: (move, t6, -, $sp[-4]) 15: (mult, t7, $sp[t3], t6) 16: (move, $sp[t2], -, t7) 17: (move, t8, -, 4) 18: (move, $sp[12], -, $sp[t8]) 19: (add, $sp, $sp, 16) 20: (return, -, -, -) 21: (add, $sp, $sp, 16) 22: (return, -, -, -) 23: (sub, $sp, $sp, 8) 24: (move, A, -, 2) 25: (add, t9, A, 1) 26: (move, $sp[-16], -, t9) 27: (call, -, -, 0) # fact 28: (move, t10, -, $sp[-4]) 29: (move, A, -, t10) 30: (add, $sp, $sp, 8) 31: (halt, -, -, -) # End of main 32: (end-of-program, -, -, -) Symbol Table: Addr Name Type ---- ---- ---- 0: A Int 0: fact Int <- (Int) 23: main Void <- (Void)