The cache coherence problem in shared-memory multiprocessors
Shared-memory multiprocessors offer increased computational power and the programmability of the shared-memory model. However, sharing memory between processors leads to contention which delays memory accesses. Adding a cache memory for each processor reduces the average access time, but it creates the possibility of inconsistency among cached copies. The cache coherence problem is keeping all cached copies of the same memory location identical. This dissertation explores possible solutions to the cache coherence problem and identifies cache coherence protocols--solutions implemented entirely in hardware--as an attractive alternative.Protocols for shared-bus systems are shown to be an interesting special case. Previously proposed shared-bus protocols are described using uniform terminology, and they are shown to divide into two categories: invalidation and distributed write. In invalidation protocols all other cached copies must be invalidated before any copy can be changed; in distributed write protocols all copies must be updated each time a shared block is modified. In each category, a new protocol is presented with better performance than previous schemes, based on simulation results. The simulation model and parameters are described in detail.Previous protocols for general interconnection networks are shown to contain flaws and to be costly to implement. A new class of protocols is presented that offers reduced implementation cost and expandability, while retaining a high level of performance, as illustrated by simulation results using a crossbar switch. All new protocols have been proven correct; one of the proofs is included.Previous definitions of cache coherence are shown to be inadequate and a new definition is presented. Coherence is compared and contrasted with other levels of consistency, which are also identified. The consistency of shared-bus protocols is shown to be naturally stronger than that of non-bus protocols.The first protocol of its kind is presented for a large hierarchical multiprocessor, using a bus-based protocol within each cluster and a general protocol in the network connecting the clusters to the shared main memory.