Tomasulo's algorithm Problem 1



This shows the basic structure of a DLX machine with tomasulo's algorithm.
Q:Tomasulo's algorithm has a disadvantage vs. the scoreboard: only one result can complete per clock, due to the CDB. Using the same functional units in both cases, find a code sequence of no more than 10 instructions where scoreboard does not stall, but tomasulo must. Use the simulator, then indicate where this occurs in your sequence. Assume the following latencies in clock cycles:

The following are the default values of DLXtomasulo:

Tomasulo's Algorithm Hardware Configuration
 3 add/subtract units, latency =  2 cycles
 2 mult/div units,     latency = 10 cycles (multiply)
                       latency = 40 cycles (divide)
 6 load_units,         latency =  2 cycles
 3 store_units,        latency =  2 cycles

Sample solution of problem 1


This page is created by Yueh-Lin Liu(yueh@cs.bu.edu) as a part of Master project under the supervision of Prof. Azer Bestavros(best@cs.bu.edu)
Last updated: 1995.5.15