Attribute Grammar Rules for Basic Type Checking

Introduction

This document is intended as a backup for the textbook and the lectures on type checking. It presents an attribute grammar for basic type checking in C, and includes basic types int and float, arrays, pointers, and structures. We make some comments on type checking for functions, but leave the details for you to develop in the course of project three. These notes should be sufficient to get you well on your way towards that goal.

There are two aspects to the basic process of type checking: (i) processing declarations by entering the appropriate type expressions into the appropriate symbol table, and then (ii) checking the expressions and statements for type correctness. The latter process involves calculating the type of expressions recursively, checking for appropriate types at each operator, and finally checking that expressions types are appropriate in statements. If no errors occur then the types will be used during code generation (e.g., to select the appropriate kind of addition operation); when errors occur, they are reported and compilation is halted.

In order to simplify the presentation, we shall use an ambiguous grammar which includes one of each kind of illustrative feature. It is NOT the language of your project (for example, it includes pointers but omits functions), but the principles here can be easily adapted for your more realistic grammar. The presentation of the attribute grammar follows the notation in class and in the book, and can be easily adapted to Yacc.

Basic Data Structures and Functions

The rules below refer to attributes which are stored on the parsing stack, global data structures such as the stack of symbol tables, and global functions which manipulate these data objects. As described in the sample files for project three, the lexeme (characters making up the token) is stored on the parse stack for the token classes id, intnum, and floatnum. These are available in the attribute rules below as the attributes id.lexval, intnum.lexval, and floatnum.lexval.

Symbol tables store identifier names and associated information; for the present we will store only type information. However, the rules for declarations in C are somewhat complex, and difficult to implement with a single symbol table, due to the fact that declaration blocks can be nested, that is, we can have:


int X;

main() {
   struct { int x; float y; } Q;
   .....
   { int X; float Z;
      ....
	     X = Q.x; 
   }
   X = Z;
 }

We must be able to verify that Q.x has a type that can be assigned to X, which X is intended (in this case, the innermost one) and also determine that the use of Z in the last line is incorrect, as there is no definition of Z in that scope.

In order to do this, we must have a stack of dynamically allocated symbol tables (they will typically be implemented as hash tables). Whenever we enter a new declaration block, we shall push a new symbol table on the stack, and when we leave, we will pop this table off the stack. This global stack of symbol tables will be manipulated using the functions

PushST( ) = Create a new symbol table and push it on the stack;
 
Pop() = Pop the top ST off the stack and return a pointer to it.
To manipulate the stack of STs we have the functions

AddEntry( id.lexval, type ) =
Attempt to add a symbol with its type to the current (i.e., top) ST on the stack of STs; returns 1 if ok, returns 0 if the identifier is already present in the top table, as this represents a multiple declaration. Note that this only examines the top symbol table on the stack.
 
Lookup( id.lexval ) =
Look up the identifier in the ST and return a pointer to the type expression for that entry; if the identifier is not in the topmost table, look in the next table down the stack, then the next, and so on (i.e., in the declarations of successively-enclosing blocks); if the identifier is not in any ST on the stack then return NULL (id not declared).
 
LookupLocal( id.lexval, STPtr ) =
Look up the identifier in the ST given as the second argument, and return the type found, or NULL if not found. This is used for looking up structure members.

Type Expressions

We use the following type expressions.

  1. Primitive types: int, float
  2. Array of size N with base type T: array(Texpr,N)
  3. Pointer to a type T: ptr(Texpr)
  4. Struct with members defined by a ST pointed to by STptr: struct(STptr)

For example, the type declaration float ** A[10] would result in a variable A being inserted into the top ST with a type array(ptr(ptr(float)),10).

These type expressions are implemented as pointer structures which can be combined recursively to create complex data types, and also linked together into lists of type expressions (e.g., for parameter lists in functions). Here is one way to form such structures, with nested type expressions being linked vertically, so that lists of type expressions can be created horizontally::

 

The type expression data structure just given has the advantage that each cell is uniform, and contains only a single integer (encoding the type or the size) and two pointers; another possibility is to load up the top cell of the expression with any necessary information. For example, the array cell could have the the type identifier (array) and the size. What is most important is that you can build recursive types (e.g., for pointers) and list of types (for parameter lists).

The following functions will be used to manipulate type expressions:

MT( typename, arg1, arg2 ) =
Make a new Type expression involving the given typename and the appropriate associated information, e.g., an array of 10 floats would be constructed by MT(array,float,10) and an int would be constructed by MT(int,-,-).
 
IsPrim(type ) =
Returns 1 if type is int or float, else 0.
 
AssignComp( type1, type2 ) =
Check assignment compatibility: returns 1 if IsPrim(type1) and IsPrim(type2); returns 0 otherwise (e.g., tried to assign an array, or int to struct).
 
IsInt(type), IsArray(type), IsStruct(type), IsPtr(type) =
Returns 1 (true) if type is as indicated, else 0.
 
Error(str) =
Prints str on the screen and terminates compilation.
Global variables: We shall use the following global variable to simplify the grammar:
tempType -- Hold the type during parsing of a declaration

Note that there are other ways of dealing with the problem of storing information during the parse in Yacc, as discussed in class (e.g., using dummy non-terminals, and using the variables $-1, $-2, etc. to access stack cells below the current rule).

 

Attribute Grammar for Type Checking

We now present our rules for type checking, following the presentation in lecture. Comments follow the symbols "//". The word "epsilon" represents the empty string.



Grammar Rule                       Attribute Grammar
----------------------------------------------------------
Prog --> main ( ) "{" Dummy Ds Ss "}"          

// Enter types into symbol tables               

Dummy --> epsilon              PushST( );  // Create new ST

Ds  --> D ";" Ds | D ";"

Ss  --> S ";" Ss | S ";"                     

D   --> T L                                

T --> int                     tempType = MT(int,-,-)

T --> float                   tempType = MT(float,-,-) 

                                     // Note timing of next rule: Dummy creates ST
                                     // before declarations are encountered, then
                                     // type is built from that ST at end of rule
T --> struct "{" Dummy Ds "}"     tempType = MT(struct, Pop(), - ); 

L   --> N "," L               if(!AddEntry( N.lexval, N.type ))
                                   Error("Multiply-defined identifier");

L   --> N                     if(!AddEntry( N.lexval, N.type ))
                                   Error("Multiply-defined identifier");

N --> * N                     N.type = MT( ptr, N1.type, -);
                                   N.lexval = N1.lexval;

N --> id                      N.type = tempType; 
                                   N.lexval = id.lexval

N --> id [ intnum ]           N.type = MT( array, tempType, intnum.lexval) ;
                                   N.lexval = id.lexval;

// Now do type checking of expressions; must check each operator for proper operand types
// and also pass correct type up the parse tree; note that sometimes we construct a type
// for the expression (e.g., for pointers)

E --> intnum                    E.type = MT(int,-,-)

E --> floatnum                  E.type = MT(float,-,-)

E --> id                        tempType = Lookup(id.lexval);
                                if(tempType != NULL)
				                   E.type = tempType;
				                else
            			           Error("Id undeclared");

E --> id [ E ]                  if( IsInt( E1.type ) && Lookup(id.lexval) returns array(T,N) )
                                   E.type = T; 
                                else
                                   Error("Id undeclared or type incompatible with array reference"); 

E --> * id                      if( Lookup( id.lexval ) returns ptr(T) )
                                   E.type = T;
                                else
                                   Error("Id undeclared or type incompatible with pointer dereference"); 

E --> & id                      tempType = Lookup(id.lexval);
                                if(tempType != NULL)
                                  E.type = MT(ptr, tempType, -); 
                                else
                                  Error("Id undeclared");

E --> id "." id                 if( Lookup(id1.lexval) returns struct(STptr) ) {
                                   tempType = LookupLocal( id2.lexval, STptr );
                                   if( tempType == NULL )
                                      Error("Illegal member reference in struct");
                                   else
                                      E.type = tempType;
                                }
                                else
                                   Error("Id undeclared or type incompatible with structure reference"); 

E --> E + E                     if (IsInt(E1.type) && IsPrim(E2.type))
                                   E.type = E2.type;
                                else if (IsFloat(E1.type) && IsPrim(E2.type))
                                   E.type = E1.type
                                else
                                   Error("Type error for +");

E --> E % E                     if (IsInt(E1.type) && IsInt(E2.type))
                                   E.type = E1.type;
                                else
                                   Error("Type error for %");
								   
// Now check types for statements

S --> id = E                    if( !AssignComp(Lookup(id.lexval), E.type) )
                                   Error("Id undeclared or incompatible assignment types"); 
                                      
S --> while ( E ) C             if(!IsInt(E.type)) 
                                   Error("Expression type is not integer!"); 

Extending the Rules

The rules above show the basic techniques which can be applied to most of the common situations encountered in type checking for C. For example, most arithmetic operators are treated exactly like plus (i.e., are overloaded, so that a floating point operand forces the entire expression to be treated as floating point). An important case to note carefully is that boolean expressions are considered to be of integer type (as if they always return the value 1 or 0); hence the boolean operators &&, ||, and ! are treated like the modulus operator %. The relational operators can take either float or integers, and return a boolean (that is, integer) value. This will be a significant point in the next set of notes, when we consider how to translate boolean expressions.

A more complex case occurs when function declarations are encountered; here one must build a list of types representing the types of parameters and the return type; when function calls are encountered, one must build a list of argument types and check them element by element according to the notion of assignment compatibility used for parameter passing (e.g., in standard C, one can do a block assignment of a struct in parameter passing, whereas in our project three we assume that parameters are passed using strict assignment compatibility as defined above).