Browse Prior Art Database

Batched Store Broadcast in MP Caches

IP.com Disclosure Number: IPCOM000122191D
Original Publication Date: 1991-Nov-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 4 page(s) / 190K

Publishing Venue

IBM

Related People

Liu, L: AUTHOR

Abstract

Disclosed is a technique for implementing data broadcasting in multiprocessor systems with shared memory. The key approach proposed is to employ data buffering so that broadcasting traffic may be reduced.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 32% of the total text.

Batched Store Broadcast in MP Caches

      Disclosed is a technique for implementing data
broadcasting in multiprocessor systems with shared memory.  The key
approach proposed is to employ data buffering so that broadcasting
traffic may be reduced.

      We are concerned with cached multiprocessor systems with shared
memory.  Typically, when a processor writes (modifies) into its local
cache, the associated cache line(s) will need to be invalidated from
other (remote) caches when copies exist there.  Such invalidates will
result in higher cache miss ratios, since such lines invalidated at a
cache will cause cache misses upon later accesses.  The performance
impact introduced by such extra cache misses becomes significant as
cache size increases. One known technique for remedying this problem
is to have each memory write (from a processor) broadcasted to all
remote caches.  Upon receiving such a store broadcast a remote cache
updates its contents if the line happens to be there.  In this way
the miss ratio impact due to invalidates will diminish.  Such a store
broadcast scheme was implemented in typical common-bus based
multiprocessor (MP) systems with store-thru caches.  In such a
system, each processor store will need to monopolize the common-bus
(e.g., for one cycle) anyway, and a cache snooper is typically used
to monitor whether a broadcasted store hits to each remote cache.
Problems will arise, however, when the processors become faster
(relative to the common-bus) or when the number of processors
increases.
The Invention

      The ideas of the invention will be illustrated with a
multiprocessor system as depicted in Fig. 1.  A number of processors
{P1} share a common memory.  Each processor P1 has its private first
level (L1) cache.  There is a centralized storage controller (SC)
that resolves storage requests and cache coherence among processors.
Within SC there is a store buffer (SB1) for each P1 .  Although not
illustrated in Fig. 1, it is also possible for SC to maintain a
shared second level cache.  We will consider store-thru L1 caches,
similar to the IBM/3033 system design. A store from the processor
does not necessarily require the line to reside in its L1 cache.  SC
maintains copy directories for all L1 caches (as in IBM/3081 design).
Each L1 cache line entry has the following states:  VAL (VALid), INV
(INValid) and TI (Temporary Inaccessible).  When a cache line is in
TI state, it is not accessible for the moment and may become VAL
later on (e.g., when updated with remote stores).

      Each store buffer SBi may be viewed as a small cache (e.g., 8
entries of 64 byte blocks) for data stores.  For the simplicity of
illustration, we assume that the L1 cache line size is a multiple
(e.g., 2) of the block size in store buffers.  The directory of each
SBi contains the addresses and necessary status tags for the block
entries.  Each block of SBi is associated with a bit-vector (CH)
indicating th...