Lecture 01 09/13/1994 ---------- Overview -------- The emphasis of this course is on the interplay between parallel architectures, languages, and programming models. The first component of this course is introductory in nature. We will start with a brief overview of the different classifications of "parallel computers". Next, we will devote a lecture to overview the theoretical PRAM and VLSI complexity models. Next we will discuss the notions of software and hardware parallelism. Software parallelism is concerned with the parallelism that could be "discovered" in an application, whereas hardware parallelism is concerned with the parallelism "available" as a result of the existing resources in the system. Parallel programming involves the successful mapping of software parallelism onto hardware parallelism. It is often the case that an efficient mapping as such is not "obvious" if at all possible. This brings up the issue of scalability, which we discuss next. A parallel system is scalable if "processing power" and "speedups" are positively correlated. We will look at various metrics and discuss speedup laws. The next component of this course will be devoted to a survey of parallel machines. We will start with a quick review of basic and superscalar processors. Next, we will look at memory design issues for multiprocessing systems. Next, we will look at communication and routing issues for multicomputing systems. Next, we will present a number of parallel architectures representing multivector, SIMD, MIMD, shared memory and multithreaded machines, and dataflow and hybrid machines. The next component of this course will be devoted to the programmability of parallel machines. In that respect, we will look at architecture independent programming models such as UNITY, and we will review the basic premises of parallelizing compilers. Finally, we will conclude with a look at operating systems and run-time environments for current parallel machines. Computer "organization", "design", and "architecture" are different! When we study computer organization we are interested in "how" and "why" does the hardware behave in a particular way; we get to the nuts and bolts of the machine. When we study computer design we do the reverse, we figure out how to put it all together to achieve a particular behavior. When we study computer architecture, we focus on the way software and hardware interface with each other, how it looks (or should look) to the user. Computer architectures is, therefore, more conceptual. While computer organization and design are more susceptible to technology, computer architecture concepts are not. The emphasis in this course will be on parallel architectures and not parallel organizations or designs. Origins of parallelizm ---------------------- Classifications --------------- Flynn classified computers based on the number of instruction streams and the number of data streams. Traditional von Newman machines are called SISD (Single Instruction stream Single Data stream) machines since there is only one stream of instructions from memory to the control unit and one stream of data from memory to the processing unit. SIMD (Single Instruction stream Multiple Data stream) machines have multiple processing units. They have one stream of instructions flowing from memory to the control unit, but multiple data streams flowing from memory to the processing units. MISD (Multiple Instruction stream Single Data stream) machines (a.k.a. systolic machines) have multiple control units and multiple processing units. They have one stream of data flowing from memory and pipelined through the processing units, but multiple control streams, each going through the control unit of a particular processing unit. Finally, MIMD Multiple Instruction stream Multiple Data stream) machines have multiple control units and multiple processing units, where each stream of instructions controls the processing of a separate data stream. It could be argued that for general purpose computing, only SISD and MIMD are viable. In other words, SIMD and MISD are suitable as special purpose machines that do particularly well for some problems, but are almost impractical for others. Multiprocessors and Multicomputers ---------------------------------- The name "multiprocessor" is used to connote a parallel computer with a "shared common memory"; the name "multicomputer" is used to connote a parallel computer with an "unshared distributed memories" or NO Remote Memory Access (NORMA). Shared memory multiprocessors (often termed as "tightly coupled computers") are further classified into three categories: UMA, NUMA, and COMA. UMA machines feature "Uniform Memory Access", which implies that the latency for a memory access is uniform for all processors. Alternately, NUMA machines feature "Non-Uniform Memory Access", which implies that the latency for a memory access depends on the identity of the "location" of the processor and memory. Notice that a portion of the global shared memory of a NUMA machine may be uniformly accessible (i.e. part of a NUMA may be UMA). There are several memory organizations possible for NUMA machines. The most common is a distributed global memory, in which each processor maintains locally a "piece" of that memory. Access to the "local memory" is quite fast whereas access to "remote memory" (maintained by some other processor) is much slower (typically 2 orders of magnitude slower), as it requires navigation through a communication network of some sort. In addition to local memory, a NUMA machine may have a cache memory. If the collective size of the local cache memory of all processors is big enough, it may be possible to dispense with main memory altogether! This results in a COMA (Cache-Only Memory Access) machine (a.k.a. ALLCACHE machines). UMA/NUMA/COMA multiprocessor machines are further classified as being either symmetric or asymmetric. A symmetric multiprocessor gives all processors "equal access" to the devices (e.g. disks, I/O) in the system; an asymmetric multiprocessor does not. In a symmetric system, executive programs (e.g. OS kernel) may be invoked on any processor.
Date of last update: September 29, 1994.