CS-551: Lab Assignment -- Performance of Snoopy Caches

Performance of Snoopy Caches for a synthetic N-body application

Problem Description:

This laboratory exercise will let you experiment with a simulator for a variety of Snoopy Cache Coherence protocols by running a synthetic application based on Laner Jones Simulation. The exercise will allow you change the load profile (level of data sharing) of the application interactively for a host of Snoopy Cache protocols. A telnet session for demonstration of available tools is available by logging in as "guest" (Contact Prof. Bestavros for the password.)

The Snoopy Cache Coherence Protocols differ in the number of states they allow each cache block (line) to assume and the transition relation between these states as a result of bus/cpu read/write transactions. The protocols considered in this lab are:

The Illinois Write Invalidate Protocol:
- This protocol is one of the first write invalidate protocols. This scheme introduces the exclusive unmodified state of data and entirely avoids invalidations on write hits to unmodified non-shared blocks. To implement this algorithm, it is necessary to associate two status bits with each block in cache. The first bit indicates either Share or Exclusive ownership of a block, while the second is set if the block has been locally modified. As a write policy, write back policy is used. At any point in time the cache block may be in one of four states: Invalid, Exclusive Unmodified, Shared Unmodified, and Exclusive Modified.
The Synapse Protocol :
- This approach was used in the Synapse N + 1, a multiprocessor for fault-tolerant transaction processing. The N + 1 differs from other shared bus designs in that it has two system buses. The added bandwidth of the extra bus allows the system to be expanded to a maximum of 28 processors. In this protocol, cache blocks are in one of four states: Invalid, Valid (data valid and may be shared with another caches), and Dirty (data is valid, shared with no other cache, and not consistent with main memory).
The Goodman Write Once Protocol:
- This protocol was introduced in the early 1980s and was designed for the Multibus. The requirement that the scheme work with existing bus protocol was a severe restriction but one that resulted in implementation simplicity. In this protocol, blocks in the local cache can be in one of four states: Invalid, Valid (data valid but may be shared with another cache), Reserved (data valid, shared with no other cache, and consistent with main memory), and Dirty (data is valid, shared with no other cache, and not consistent with main memory -- write-back needed). A block is Reserved if it has been modified exactly once. It becomes Dirty if it is modified more than once.
The Berkeley Protocol:
- This protocol is implemented for a RISC multiprocessor designed in the University of California at Berkeley. The approach is similar to Synapse with two major improvements: cache to cache transfer implemented on the shared bus and dirty data is not updated in the main memory when its becomes shared. The following states are used: Invalid, Valid, Shared Dirty, and Exclusive Dirty. Like the Synapse protocol, the Berkeley approach uses the idea of ownership -- the cache that has the block in a Dirty state is the owner of that block. If a block is not owned by any cache, memory is the owner; in any case, the owner supplies the block on a miss.
Dragon and Firefly :
- The Firefly cache coherence protocol was used in the Firefly, a multiprocessor workstation developed by DEC. The Dragon protocol is a small variation of the Firefly, which was researched by Xerox Palo Alto Center couple of month later. The main difference between these two protocols is that Dragon protocol does not update memory on a cache to cache transfer and delays the memory and cache consistency until the data is evicted and written back, which saves time and lowers the memory access requirements. (Unfortunately, from the point of the simulator used in this laboratory, memory access delay is not different from cache access latency, so no difference in performance can be seem for the Dragon approach.) These two schemes represent a family of write update (or write broadcast) protocols. This snoopy protocols has ability to detect dynamicly the sharing status of a block and use a write through policy for shared blocks and write back for currently non-shared blocks. These approaches are most powerful ones in their class. The hit rate for such protocols was found close to 100% for many applications. The Dragon protocol employs the following five states for the cache blocks: Invalid , Valid Exclusive, Shared Clean, Shared Dirty, Exclusive Dirty. When a line is evicted from the cache on a cache miss, a block tagged with a {dirty} bit has to be written back to the main memory in order to keep memory consistent.

The Exercise:

Which one of the above cache consistency protocols would you suggest for an application with small (large) data read sharing? Use performance obtained by running the synthetic application to sustain your answer.

Illustrations:

This document has been prepared by Professor Azer Bestavros <best@cs.bu.edu> for CS-551, which is part of the NSF-funded undergraduate curriculum on parallel computing at BU.

Date of last update: May 22, 1994.