User's Guide for The Tomasulo Algorithm



Option

dlxsim -TOMASULO
[-al#] [-au#] [-dl#] [-ml#] [-mu#] [-Ll#] [-Lu#] [-Sl#] [-Su#]

[-TOMASULO] Execute the scoreboarding version of DLXsim.
[ -al# ] Select the latency for a floating point add (in clocks).
[ -au# ] Select the number of floating point add units.
[ -dl# ] Select the latency for a floating point divide.
[ -ml# ] Select the latency for a floating point multiply.
[ -mu# ] Select the number of floating point multiply units.
[ -Ll# ] Select the latency for a floating point load.
[ -Lu# ] Select the number of floating point load units.
[ -Sl# ] Select the latency for a floating point load.
[ -Su# ] Select the number of floating point load units.

Command

Assembly file format


The assembler built into DLXsim, invoked using the load command, accepts standard format DLX assembly language programs. The file is expected to contain lines of the following form: While the assembler is processing an assembly file, the data and instructions it assembles are placed in memory based on either a text (code) or data pointer. Which pointer is used is selected not by the type of information, but by whether the most recent directive was .data or .text. The program initially loads into the text segment.
The assembler supports several directives which affect how it loads the DLX's memory. These should be entered in the place where you would normally place the instruction and its arguments. The directives currently supported by DLXsim are:

In Tomasulo's algorithm, all instructions are either FP operations, FP loads, or FP stores. Since it get instructions from floating point instruction queue. In this simulator, it can accept nop and trap this two instruction which is used for step tracing.


Reservation stations and register tags

                      TOMASULO's         6 th clock cycle
  Instruction         Issue          Execute       Write Result  
+========================================================================+
         ld f6,A(r1)    V               V               V 
         ld f2,B(r2)    V               V               V 
      multf f0,f2,f4    V                                 
       subf f8,f6,f2    V                                 
      divf f10,f0,f6    V                                 
       addd f6,f8,f2    V                                 
+=======================================================================+

     Name   Busy     Op         Vj         Vk        Qj       Qk  
+=======================================================================+
    add1     YES     subf      (f6)      (load2)                    
    add2     YES     addd                 (f2)      add1            
    add3     NO    (null)                                           
    mul1     YES    multf     (load2)     (f4)                      
    mul2     YES     divf                 (f6)      mul1            
+=======================================================================+

      F0      F2      F4      F6      F8      F10      F12     F14  
+----------------------------------------------------------------------+
Qi   mul1                    add2    add1     mul2                   
Busy  YES     NO      NO      YES     YES     YES      NO      NO  
+======================================================================+
     F16     F18     F20     F22     F24     F26     F28     F30
+----------------------------------------------------------------------+
                                                                   
Busy  NO      NO      NO      NO      NO      NO      NO      NO  
+======================================================================+

Each reservation station has six fields:

Op-    The operation to perform on source operands S1 and S2.
Qj,Qk- The reservation stations that will produce the corresponding source operand; a value of zero indicates that the source operands is already available in Vi or Vj, or is unnecessary. The IBM 360/91 calls these SINK unit and SOURCE unit.
VJ,Vk- The value of the source operands. These are called SINK and SOURCE on the IBM 360/91. Note that only one of the V field or the Q field is valid for each operand.
Busy- Indicates that this reservation station and its accompanying functional unit are occupied.

The register file and store buffer each have two fields:
Qi-    The number of the functional unit that will produce a value to be
       stored into this register or into memory. If the value of Qi is zero, 
       no currently active instruction is computing a result destined for 
       this register or buffer. For a register, this means the value is given
       by the register contents.
Busy- Indicates that this register file or store buffer unit are waiting for result.

Check Hazard

WAW and WAR hazards are elimiated by renaming registers using the reservation stations. If we don't rename certain registers, it will cause wrong result. In some morden CPUs (like POWER1), it can rename registers[see Ref.3].


Tomasulo's algorithm
--------------------------------------------------------------------------
InstructionStatus      Wait until                    Bookkeeping
--------------------------------------------------------------------------
Issue              Station or buffer          if(register[S1].Qi!=0)
                   empty                       {RS[r].Qj<-Register{S1].Qi}
                                              else{RS[r].Vj<-S1;RS[r].Qj<-0};
                                              if(register[S2].Qi!=0)
                                               {RS[r].Qk<-Register{S2].Qi}
                                              else{RS[r].Vj<-S1;RS[r].Qj<-0};
                                              RS[r].busy<-YES;
                                              Register[D].Qi=r;
--------------------------------------------------------------------------
Execute            RS[r].Qj=0 and             None-operands are in Vj and Vk
                   RS[r].Qk=0
--------------------------------------------------------------------------
Write result       Execution completed at r   for all(if(Register[x].Qi=r)
                    and CDB available          {Fx<-result;Register[x].Qi<-0
                                                Register[x].busy<-NO});
                                              for all(if(RS[x].Qj=r)
                                               {RS[x].Vj<-result;RS[x].Qj<-0});
                                              for all(if(RS[x].Qk=r)
                                               {RS[x].Vk<-result;RS[x].Qk<-0});
                                              for all(if(Store[x].Qi=r)
                                              {Store[x].V<-result;RS[x].Qi<-0});
                                              RS[r].busy<-NO;
--------------------------------------------------------------------------
For the issuing instruction, D is the destination. S1 and S2 are the source, and r is the reservation station or buffer that D is assigned to . RS is the reservation-station data structure. The value returned by a reservation station or by the load unit is called the"result."Register is the register data structure, while Store is the store-buffer data structure. When an instruction is issued, the destination register has its Qi field set to the number of the buffer or reservation station to which the instruction is issued. If the operands are available in the registers, they are stored in the V field. Otherwise, the Q fields are set to indicate the reservation station that will produce the values needed as source operands. the instruction waits at the reservation station until both its operands are available, indicated by zero in the Q fields. The Q fields are set to zero either when this instruction is issued, or when an instruction on which this instruction depends completes and does its write back. When an instruction has finished execution and the CDB is available, it can do its write back. All the buffers, registers and reservation stations whose value of Qj or Qk is the same as the completing reservation station update their values from the CDB and mark the Q fields to indicate that values have been received. Thus, the CDB can broadcast its result to many destinations in a single clock cycle, and if the waiting instructions have their operands, they can all begin execution on the next clock cycle. For simplicity we assume that all bookkeeping actions are done in a single cycle.


Example DLX code running on this simulator

Last updated: 1995.5.10