Lecture 01
			      09/13/1994
			      ----------

Overview
--------
The emphasis of this course is on the interplay between parallel
architectures, languages, and programming models. 

The first component of this course is introductory in nature. We will
start with a brief overview of the different classifications of
"parallel computers". Next, we will devote a lecture to overview the
theoretical PRAM and VLSI complexity models. Next we will discuss the
notions of software and hardware parallelism. Software parallelism is
concerned with the parallelism that could be "discovered" in an
application, whereas hardware parallelism is concerned with the
parallelism "available" as a result of the existing resources in the
system. Parallel programming involves the successful mapping of
software parallelism onto hardware parallelism. It is often the case
that an efficient mapping as such is not "obvious" if at all
possible. This brings up the issue of scalability, which we discuss
next. A parallel system is scalable if "processing power" and
"speedups" are positively correlated. We will look at various metrics
and discuss speedup laws.

The next component of this course will be devoted to a survey of
parallel machines. We will start with a quick review of basic and
superscalar processors. Next, we will look at memory design issues for
multiprocessing systems. Next, we will look at communication and
routing issues for multicomputing systems. Next, we will present a
number of parallel architectures representing multivector, SIMD, MIMD,
shared memory and multithreaded machines, and dataflow and hybrid
machines. 

The next component of this course will be devoted to the
programmability of parallel machines. In that respect, we will look at
architecture independent programming models such as UNITY, and we will
review the basic premises of parallelizing compilers. Finally, we will
conclude with a look at operating systems and run-time environments
for current parallel machines. 

Computer "organization", "design", and "architecture" are different!
When we study computer organization we are interested in "how" and
"why" does the hardware behave in a particular way; we get to the
nuts and bolts of the machine. When we study computer design we do the
reverse, we figure out how to put it all together to achieve a
particular behavior. When we study computer architecture, we focus on
the way software and hardware interface with each other, how it looks
(or should look) to the user. Computer architectures is, therefore,
more conceptual. While computer organization and design are more
susceptible to technology, computer architecture concepts are not. The
emphasis in this course will be on parallel architectures and not
parallel organizations or designs.

Origins of parallelizm
----------------------

Classifications
---------------
Flynn classified computers based on the number of instruction streams
and the number of data streams. Traditional von Newman machines are
called SISD (Single Instruction stream Single Data stream) machines
since there is only one stream of instructions from memory to the
control unit and one stream of data from memory to the processing
unit. SIMD (Single Instruction stream Multiple Data stream) machines
have multiple processing units. They have one stream of instructions
flowing from memory to the control unit, but multiple data streams
flowing from memory to the processing units. MISD (Multiple Instruction
stream Single Data stream) machines (a.k.a. systolic machines) have
multiple control units and multiple processing units. They have one
stream of data flowing from memory and pipelined through the
processing units, but multiple control streams, each going through the
control unit of a particular processing unit. Finally, MIMD Multiple
Instruction stream Multiple Data stream) machines have multiple
control units and multiple processing units, where each stream of
instructions controls the processing of a separate data stream.

It could be argued that for general purpose computing, only SISD and
MIMD are viable. In other words, SIMD and MISD are suitable as 
special purpose machines that do particularly well for some problems,
but are almost impractical for others.


Multiprocessors and Multicomputers
----------------------------------
The name "multiprocessor" is used to connote a parallel computer with a
"shared common memory"; the name "multicomputer" is used to connote a
parallel computer with an "unshared distributed memories" or NO Remote
Memory Access (NORMA). 

Shared memory multiprocessors (often termed as "tightly coupled
computers") are further classified into three categories: UMA, NUMA,
and COMA. UMA machines feature "Uniform Memory Access", which implies
that the latency for a memory access is uniform for all
processors. Alternately, NUMA machines feature "Non-Uniform Memory
Access", which implies that the latency for a memory access depends on
the identity of the "location" of the processor and memory. Notice
that a portion of the global shared memory of a NUMA machine may be
uniformly accessible (i.e. part of a NUMA may be UMA).  There are
several memory organizations possible for NUMA machines. The most
common is a distributed global memory, in which each processor
maintains locally a "piece" of that memory. Access to the "local
memory" is quite fast whereas access to "remote memory" (maintained by
some other processor) is much slower (typically 2 orders of magnitude
slower), as it requires navigation through a communication network of
some sort. In addition to local memory, a NUMA machine may have a
cache memory. If the collective size of the local cache memory of all
processors is big enough, it may be possible to dispense with main
memory altogether! This results in a COMA (Cache-Only Memory Access)
machine (a.k.a. ALLCACHE machines).

UMA/NUMA/COMA multiprocessor machines are further classified as being
either symmetric or asymmetric. A symmetric multiprocessor gives
all processors "equal access" to the devices (e.g. disks, I/O) in the
system; an asymmetric multiprocessor does not. In a symmetric system,
executive programs (e.g. OS kernel) may be invoked on any processor.
This document has been prepared by Professor Azer Bestavros <best@cs.bu.edu> as the WWW Home Page for CS-551, which is part of the NSF-funded undergraduate curriculum on parallel computing at BU.
Date of last update: September 29, 1994.