Lecture 07 10/04/1994 Overview of Basic Processor Design ---------------------------------- o Instruction Set Architectures o RISC vs CISC architectures o Scalar, Superscalar, and Superpipelines pipelined architectures o Linear vs non-linear pipelining o Coprocessors and VLIW o Vector processing Overview of Basic Memory Design ------------------------------- o The memory hierarchy o Memory inclusion, coherence, and locality properties o Hit ratios and effective access times o Virtual memory (segmentation and paging) o Shared memory Linear and Nonlinear Pipelining ------------------------------- A pipelined computation is described by a reservation table that describes the usage through time of the different stages of a pipeline. For simple linear pipelines, such a reservation table is trivial. For example the following shows the reservation table for a 3 stage non-linear pipeline. Cycle# | 1 2 3 4 5 6 7 8 --------+----------------------- stage 1 | X X X stage 2 | X X stage 3 | X X X -------------------------------- The number of clock cycles between two initiations of a pipeline is called the latency between the two initiations. To determine if two intitiatons with a latency T are possible, the reservation table is used to discover if any collisions will result. A collision results from the attempts of two initiations to make use of the same stage at the same time. For linear pipelines, T=1. For non-linear pipelines, this is not straightforward. A latency cycle is a sequence of initiation latencies that does not cause collisions. Repeating a latency cycle indefinitely does not cause collisions. The average initiation latency of a pipeline operating on a latency cycle is the average of the latencies in the cycle. The best latency cycle for a pipeline is one that results in a minimal average initiation latency. For linear pipelines, the latency cycle (1) is optimum. Optimizing the latency cycle for non-linear pipelines involves (possibly) the insertion of delay stages (non-computing stages). The pipeline efficiency is the (average) percentage of time that each pipeline stage is used. Ideally, a linear pipeline would have a 100% efficiency. This is not usually the case due to hazards, which are due to either data dependencies or branches. Shared Memory Models -------------------- On a multiprocessor with shared memory, concurrent instruction streams executing on different processors are processes. Memory events correspond to shared-memory accesses by these processes. Consistency models specify the order by which the events from one process should be observed by other processes in the machine. The sequential consistency (SC) model is the "strongest" consistency model in that it requires loads, stores, and swaps (synchronization ops) to execute serially in a single global memory order that conforms to the individual program orders of the processors. Most multiprocessors implement the SC model because of its simple semantics (which makes for easy programmability). However, the model may lead to poor performance due to the strict constraints on memory order, especially for a large number of processes. Thus, SC reduces the scalability of a multiprocessor system. Weak consistency models are less restrictive. One such model is the Total Store Order (TSO) which allows stores to go through without delay but blocks on reads and swaps (synchronization). Each processor keeps a FIFO buffer of all its writes and swaps in program order and the memory services an operation from a processor at random. On a load or swap, the processor checks its buffer for a value. If none is found, the processor blocks waiting for a value to be returned from memory. Weak consistency models may offer better performance (scalability) than the sequential consistency model at the expense of more hardware/software support and more programmer involvment.
Date of last update: September 29, 1994.