Performance of Snoopy Caches for a synthetic N-body application
Problem Description:
This laboratory exercise will let you experiment with a
simulator for
a variety of Snoopy Cache Coherence protocols by running a synthetic
application based on Laner Jones Simulation. The exercise will allow
you change the load profile (level of data sharing) of the application
interactively for a host of Snoopy Cache protocols. A telnet session for demonstration of
available tools is available by logging in as "guest" (Contact
Prof. Bestavros for the password.)
The Snoopy Cache Coherence Protocols differ in the number of states
they allow each cache block (line) to assume and the
transition
relation between these states as a result of bus/cpu read/write
transactions. The protocols considered in this lab
are:
- The Illinois Write Invalidate Protocol:
- This protocol is one of the first write invalidate
protocols. This scheme introduces the exclusive unmodified
state of data and entirely avoids invalidations on write hits to
unmodified non-shared blocks. To implement this algorithm, it is
necessary to associate two status bits with each block in cache. The
first bit indicates either Share or Exclusive ownership of a block,
while the second is set if the block has been locally modified. As a
write policy, write back policy is used. At any point in time the
cache block may be in one of four states: Invalid,
Exclusive Unmodified, Shared Unmodified, and
Exclusive Modified.
- The Synapse Protocol :
- This approach was used in the Synapse N + 1, a multiprocessor for
fault-tolerant transaction processing. The N + 1 differs from other
shared bus designs in that it has two system buses. The added
bandwidth of the extra bus allows the system to be expanded to a
maximum of 28 processors. In this protocol, cache blocks are in one of
four states: Invalid, Valid (data valid and may
be shared with another caches), and Dirty (data is valid,
shared with no other cache, and not consistent with main memory).
- The Goodman Write Once Protocol:
- This protocol was introduced in the early 1980s and was designed
for the Multibus. The requirement that the scheme work with existing bus
protocol was a severe restriction but one that resulted in
implementation simplicity. In this
protocol, blocks in the local cache can be in one of four states:
Invalid, Valid (data valid but may be shared with
another cache), Reserved (data valid, shared with no
other cache, and consistent with main memory), and Dirty
(data is valid, shared with no other cache, and not consistent with
main memory -- write-back needed). A block is Reserved if it
has been modified exactly once. It becomes Dirty if it is
modified more than once.
- The Berkeley Protocol:
- This protocol is implemented for a RISC multiprocessor designed in
the University of California at Berkeley. The approach is similar to
Synapse with two major improvements: cache to cache transfer
implemented on the shared bus and dirty data is not updated in the
main memory when its becomes shared. The following states are used:
Invalid, Valid, Shared Dirty, and
Exclusive Dirty. Like the Synapse protocol, the Berkeley approach
uses the idea of ownership -- the cache that has the block in a
Dirty state is the owner of that block. If a block is not owned
by any cache, memory is the owner; in any case, the owner supplies the
block on a miss.
- Dragon and Firefly :
- The Firefly cache coherence protocol was used in the Firefly, a
multiprocessor workstation developed by DEC. The Dragon protocol is a
small variation of the Firefly,
which was researched by Xerox Palo Alto Center couple of month
later. The main difference between these two protocols is that Dragon
protocol does not update memory on a cache to cache transfer and
delays the memory and cache consistency until the data is evicted and
written back, which saves time and lowers the memory access
requirements. (Unfortunately, from the point of the simulator used in
this laboratory, memory access delay is not different from cache access
latency, so no difference in performance can be seem for the Dragon
approach.) These two schemes represent a family of write update (or write
broadcast) protocols. This snoopy protocols has ability to detect
dynamicly the sharing status of a block and use a write through policy
for shared blocks and write back for currently non-shared
blocks. These approaches are most powerful ones in their class. The
hit rate for such protocols was found close to 100% for many
applications. The Dragon protocol employs the following five states for the
cache blocks: Invalid , Valid Exclusive,
Shared Clean, Shared Dirty, Exclusive Dirty.
When a line is evicted from the cache on a cache miss, a block tagged
with a {dirty} bit has to be written back to the main memory in order
to keep memory consistent.
The Exercise:
- Which one of the above cache consistency protocols would you
suggest for an application with small (large) data read sharing?
Use performance obtained by running the synthetic application to
sustain your answer.
Illustrations:
This document has been prepared by Professor Azer Bestavros
<best@cs.bu.edu> for CS-551, which
is part of the NSF-funded
undergraduate curriculum on parallel computing at BU.
Date of last update: May 22, 1994.