User's Guide for The Tomasulo Algorithm
Option
dlxsim -TOMASULO
[-al#] [-au#] [-dl#] [-ml#] [-mu#] [-Ll#] [-Lu#] [-Sl#] [-Su#]
[-TOMASULO] Execute the scoreboarding version of DLXsim.
[ -al# ] Select the latency for a floating point add (in clocks).
[ -au# ] Select the number of floating point add units.
[ -dl# ] Select the latency for a floating point divide.
[ -ml# ] Select the latency for a floating point multiply.
[ -mu# ] Select the number of floating point multiply units.
[ -Ll# ] Select the latency for a floating point load.
[ -Lu# ] Select the number of floating point load units.
[ -Sl# ] Select the latency for a floating point load.
[ -Su# ] Select the number of floating point load units.
Command
- go [ address ]
Start simulating the DLX machine. If address is given,
execution starts at that memory address. Otherwise, it continues from
wherever it left off previously. This command does not complete until
simulated execution stops. The return value is an information string
about why execution stopped and the current state of the machine.
- load file [ file file ...]
Read each of the given files. Treat them as DLX assembly
language files and load memory as indicated in the files. Code (text)
is normally loaded starting at address 0x100, but the codeStart
variable may be used to set a different starting address. Data is
normally loaded starting at address 0x1000, but a different starting
address may be specified in the dataStart variable. The return
value is either an empty string or an error message describing
problems in reading the files. A list of directives that the loader
understands is in a later section of this manual.
- put address number
Store number in the register or memory location given by
address. The return value is an empty string. To store floating
point numbers (single or double precision), use the fput
command.
- quit
Exit the simulator.
- stats [stalls] [opcount][branch] [hw] [all]
This command will dump various statistics collected by the simulator
on the DLX code that has been run so far. Any combination of options
may be selected. The options and their results are as follows:
- stalls
Show the number of structural hazard stalls
- opcount
Show the number of each operation that has been executed.
- branch
Show the percentage of branches taken and not-taken.
- tomasulo
Show the reservation stations and register tags.
- hw
Show the current hardware setup for the simulated machine.
- all
Equivalent to choosing all options to show statistics. This is the default.
- step[ address ]
If no address is given, the step command executes a single
instruction, continuing from wherever execution previously stopped.
If address is given, then the program counter is changed to
point to address, and a single instruction is executed from
there. In either case, the return value is an information string
about the state of the machine after the single instruction has been
executed.
Assembly file format
The assembler built into DLXsim, invoked using the load
command, accepts standard format DLX assembly language programs. The file is expected to contain lines of the following form:
- Labels are defined by a group of non-blank characters starting
with either a letter, an underscore, or a dollar sign, and followed
immediately by a colon. They are associated with the next address to
which code in the file will be stored. Labels can be accessed anywhere
else within that file, and in files loaded after that if the label is
declared as .global (see below).
- Comments are started with a semicolon, and continue to the end of the line.
- Constants can be entered either with or without a preceding number sign.
- The format of instructions and their operands are as shown in
the Computer Architecture book.
While the assembler is processing an assembly file, the data and
instructions it assembles are placed in memory based on either a text
(code) or data pointer. Which pointer is used is selected not by the
type of information, but by whether the most recent directive was .data or .text. The program initially loads into the text
segment.
The assembler supports several directives which affect how it loads
the DLX's memory. These should be entered in the place where you
would normally place the instruction and its arguments. The
directives currently supported by DLXsim are:
- .align
Cause the next data/code loaded to be at the next higher address with
the lower n bits zeroed (the next closest address greater than or
equal to the current address that is a multiple of 2^{n-1}).
- .ascii [`` string1'', `` string2'', ...]
Store the strings listed on the line in memory as a list of
characters. The strings are not terminated by a 0 byte.
- .asciiz [`` string1'', `` string2'',...]
Similar to .ascii, except each string is followed by a 0 byte
(like C strings).
- .byte [`` byte1'', `` byte2'',...]
Store the bytes listed on the line sequentially in memory.
- .data[ address ]
Cause the following code and data to be stored in the data area. If
an address was supplied, the data will be loaded starting at
that address, otherwise, the last value for the data pointer will be
used. If we were just reading code based on the text (code) pointer,
store that address so that we can continue from there later (on a
.text directive).
- .double [ number1, number2,...]
Store the numbers listed on the line sequentially in memory as
double precision floating point numbers.
- .float [ number1, number2,...]
Store the numbers listed on the line sequentially in memory as
single precision floating point numbers.
- .global [ label ]
Make the label available for reference by code found in files
loaded after this file.
- .space [ size]
Move the current storage pointer forward size bytes (to leave some
empty space in memory).
- .text [ address]
Cause the following code and data to be stored in the text (code)
area. If an address was supplied, the data will be loaded
starting at that address, otherwise, the last value for the text
pointer will be used. If we were just reading data based on the data
pointer, store that address so that we can continue from there later
(on a .data directive).
- .word [ word1, word2,...]
Store the words listed on the line sequentially in memory.
In Tomasulo's algorithm, all instructions are either FP operations, FP loads, or FP stores. Since it get instructions from floating point instruction queue. In this simulator, it can accept nop and trap this two instruction which is used for step tracing.
Reservation stations and register tags
TOMASULO's 6 th clock cycle
Instruction Issue Execute Write Result
+========================================================================+
ld f6,A(r1) V V V
ld f2,B(r2) V V V
multf f0,f2,f4 V
subf f8,f6,f2 V
divf f10,f0,f6 V
addd f6,f8,f2 V
+=======================================================================+
Name Busy Op Vj Vk Qj Qk
+=======================================================================+
add1 YES subf (f6) (load2)
add2 YES addd (f2) add1
add3 NO (null)
mul1 YES multf (load2) (f4)
mul2 YES divf (f6) mul1
+=======================================================================+
F0 F2 F4 F6 F8 F10 F12 F14
+----------------------------------------------------------------------+
Qi mul1 add2 add1 mul2
Busy YES NO NO YES YES YES NO NO
+======================================================================+
F16 F18 F20 F22 F24 F26 F28 F30
+----------------------------------------------------------------------+
Busy NO NO NO NO NO NO NO NO
+======================================================================+
Each reservation station has six fields:
Op- The operation to perform on source operands S1 and S2.
Qj,Qk- The reservation stations that will produce the corresponding source
operand; a value of zero indicates that the source operands is already
available in Vi or Vj, or is unnecessary. The IBM 360/91 calls these
SINK unit and SOURCE unit.
VJ,Vk- The value of the source operands. These are called SINK and SOURCE on
the IBM 360/91. Note that only one of the V field or the Q field is
valid for each operand.
Busy- Indicates that this reservation station and its accompanying functional
unit are occupied.
The register file and store buffer each have two fields:
Qi- The number of the functional unit that will produce a value to be
stored into this register or into memory. If the value of Qi is zero,
no currently active instruction is computing a result destined for
this register or buffer. For a register, this means the value is given
by the register contents.
Busy- Indicates that this register file or store buffer unit are waiting
for result.
Check Hazard
- Issue-check structural hazards
- Execute-check RAW
- Write result-check if CDB available
WAW and WAR hazards are elimiated by renaming registers using the reservation stations. If we don't rename certain registers, it will cause wrong result. In some morden CPUs (like POWER1), it can rename registers[see Ref.3].
Tomasulo's algorithm
--------------------------------------------------------------------------
InstructionStatus Wait until Bookkeeping
--------------------------------------------------------------------------
Issue Station or buffer if(register[S1].Qi!=0)
empty {RS[r].Qj<-Register{S1].Qi}
else{RS[r].Vj<-S1;RS[r].Qj<-0};
if(register[S2].Qi!=0)
{RS[r].Qk<-Register{S2].Qi}
else{RS[r].Vj<-S1;RS[r].Qj<-0};
RS[r].busy<-YES;
Register[D].Qi=r;
--------------------------------------------------------------------------
Execute RS[r].Qj=0 and None-operands are in Vj and Vk
RS[r].Qk=0
--------------------------------------------------------------------------
Write result Execution completed at r for all(if(Register[x].Qi=r)
and CDB available {Fx<-result;Register[x].Qi<-0
Register[x].busy<-NO});
for all(if(RS[x].Qj=r)
{RS[x].Vj<-result;RS[x].Qj<-0});
for all(if(RS[x].Qk=r)
{RS[x].Vk<-result;RS[x].Qk<-0});
for all(if(Store[x].Qi=r)
{Store[x].V<-result;RS[x].Qi<-0});
RS[r].busy<-NO;
--------------------------------------------------------------------------
For the issuing instruction, D is the destination. S1 and S2 are the source, and r is the reservation station or buffer that D is assigned to . RS is the
reservation-station data structure. The value returned by a reservation
station or by the load unit is called the"result."Register is the register data
structure, while Store is the store-buffer data structure. When an instruction
is issued, the destination register has its Qi field set to the number of
the buffer or reservation station to which the instruction is issued. If the
operands are available in the registers, they are stored in the V field.
Otherwise, the Q fields are set to indicate the reservation station that
will produce the values needed as source operands. the instruction waits at the
reservation station until both its operands are available, indicated by
zero in the Q fields. The Q fields are set to zero either when this instruction is issued, or when an instruction on which this instruction depends completes
and does its write back. When an instruction has finished execution and the CDB
is available, it can do its write back. All the buffers, registers and
reservation stations whose value of Qj or Qk is the same as the completing reservation station update their values from the CDB and mark the Q fields to indicate that values have been received. Thus, the CDB can broadcast its result to many destinations in a single clock cycle, and if the waiting instructions have their operands, they can all begin execution on the next clock cycle. For
simplicity we assume that all bookkeeping actions are done in a single cycle.
Example DLX code running on this simulator
Last updated: 1995.5.10