Lecture 07
                              10/04/1994


Overview of Basic Processor Design
----------------------------------
  o Instruction Set Architectures
  o RISC vs CISC architectures
  o Scalar, Superscalar, and Superpipelines pipelined architectures
  o Linear vs non-linear pipelining
  o Coprocessors and VLIW
  o Vector processing


Overview of Basic Memory Design
-------------------------------
  o The memory hierarchy
  o Memory inclusion, coherence, and locality properties
  o Hit ratios and effective access times
  o Virtual memory (segmentation and paging)
  o Shared memory


Linear and Nonlinear  Pipelining
-------------------------------
A pipelined computation is described by a reservation table that
describes the usage through time of the different stages of a
pipeline.  For simple linear pipelines, such a reservation table is
trivial. For example the following shows the reservation table for a 3
stage non-linear pipeline. 
           
     Cycle#  | 1  2  3  4  5  6  7  8
     --------+-----------------------
     stage 1 | X              X     X
     stage 2 |    X     X
     stage 3 |       X     X     X
     --------------------------------

The number of clock cycles between two initiations of a pipeline is
called the latency between the two initiations. To determine if two
intitiatons with a latency T are possible, the reservation table is
used to discover if any collisions will result. A collision results
from the attempts of two initiations to make use of the same stage at
the same time. For linear pipelines, T=1. For non-linear pipelines,
this is not straightforward. A latency cycle is a sequence of
initiation latencies that does not cause collisions. Repeating a latency
cycle indefinitely does not cause collisions. The average initiation
latency of a pipeline operating on a latency cycle is the average of
the latencies in the cycle. The best latency cycle for a pipeline is
one that results in a minimal average initiation latency. For linear
pipelines, the latency cycle (1) is optimum. Optimizing the latency
cycle for non-linear pipelines involves (possibly) the insertion of
delay stages (non-computing stages). 

The pipeline efficiency is the (average) percentage of time that each
pipeline stage is used. Ideally, a linear pipeline would have a 100%
efficiency. This is not usually the case due to hazards, which are due
to either data dependencies or branches. 


Shared Memory Models 
-------------------- 
On a multiprocessor with shared memory, concurrent instruction streams
executing on different processors are processes. Memory events
correspond to shared-memory accesses by these processes. Consistency
models specify the order by which the events from one process should
be observed by other processes in the machine. 

The sequential consistency (SC) model is the "strongest" consistency
model in that it requires loads, stores, and swaps (synchronization
ops) to execute serially in a single global memory order that conforms
to the individual program orders of the processors. Most
multiprocessors implement the SC model because of its simple semantics
(which makes for easy programmability). However, the model may lead to
poor performance due to the strict constraints on memory order,
especially for a large number of processes. Thus, SC reduces the
scalability of a multiprocessor system.

Weak consistency models are less restrictive. One such model is the
Total Store Order (TSO) which allows stores to go through without
delay but blocks on reads and swaps (synchronization). Each processor
keeps a FIFO buffer of all its writes and swaps in program order and
the memory services an operation from a processor at random. On a
load or swap, the processor checks its buffer for a value. If none is
found, the processor blocks waiting for a value to be returned from
memory. 

Weak consistency models may offer better performance (scalability)
than the sequential consistency model at the expense of more
hardware/software support and more programmer involvment.
This document has been prepared by Professor Azer Bestavros <best@cs.bu.edu> as the WWW Home Page for CS-551, which is part of the NSF-funded undergraduate curriculum on parallel computing at BU.
Date of last update: September 29, 1994.