Browse Prior Art Database

Store Buffering at Second Level Cache/Memory Hierarchy

IP.com Disclosure Number: IPCOM000120390D
Original Publication Date: 1991-Apr-01
Included in the Prior Art Database: 2005-Apr-02
Document File: 3 page(s) / 113K

Publishing Venue

IBM

Related People

Liu, L: AUTHOR

Abstract

A technique is described whereby a mechanism provides buffering stores at the second level cache/memory hierarchy, as used in multi-processing (MP) systems.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Store Buffering at Second Level Cache/Memory Hierarchy

      A technique is described whereby a mechanism provides
buffering stores at the second level cache/memory hierarchy, as used
in multi-processing (MP) systems.

      Generally, buffering stores has the potential of significantly
improving the performance of second level memory/cache hierarchy for
store-thru cache designs.  This is especially evident when the number
of central processors (CPs) increases.  To illustrate, assume an
environment in which two level caches (L1/L2) are used for an MP with
L2 shared by all CPs.  Also, assume that partial store merges are
performed at the first level and that a certain kind of EX locking is
available for L1 lines.  Batching stores at the L1 level may require
higher store bandwidths when modified data are batched to L2.

      For instance, consider a cache line size of 64 bytes, and
consists of eight doublewords.  When a line with six modified
doublewords is sent to L2 through a quadword bus, it would require at
least three cycles. Such a transfer may be done in two cycles if the
store bus bandwidth is increased to two quadwords per cycle.  Such an
increase of store bus bandwidth could be inconvenient for certain
implementations.  Also, from experiments, it was observed that with
few (e.g., eight) entries in the store buffer, a high percentage of
the lines covered only single modified doublewords.  As a result, a
wider store bandwidth to L2 may not be utilized well.  Therefore, in
certain environments it may be advantageous to buffer stores at the
second level (L2).  Putaway of blocks buffered at L2 is often easier
to implement efficiently, since bussing may be avoided.  In this way,
the need for a wider store bandwidth (L1 --> L2) for store buffering
may be avoided.

      The concept described herein considers that for each CP P1i, a
store buffer Si, which is located at the storage control element
(SCE), as shown in the figure.  There are a fixed number of blocks in
each Si, which is organized as a stack or as a set-associative array.
Each block in a store buffer is a fixed size, which is normally not
bigger than the L1 line size and should be bigger than the store bus
bandwidth.  The design is such that a block in Si is always covered
by a cache line locked EX for Pi .  The directory of Si can record
which sub-units (e.g., doublewords) within each block containing
modified data (which are to be updated to L2 later).  The operations
are as follows:
   (1) When a store to L2 from Pi hits a block B in Si -
The store is put into the block of Si .
     (2) When a store to L2 from Pi with the block missing in Si -
A new block entry for B is created in Si .  The replaced block B'
is updated to L2.  T...