Browse Prior Art Database

Using a WTWAX/WI L1 Cache Protocol to Reduce Store Traffic

IP.com Disclosure Number: IPCOM000106433D
Original Publication Date: 1993-Nov-01
Included in the Prior Art Database: 2005-Mar-21
Document File: 4 page(s) / 136K

Publishing Venue

IBM

Related People

Bennett, B: AUTHOR [+4]

Abstract

One of the limitations in a MP configuration is the pin limitation of the L2 and a need exists to reduce the bandwidth requirements of the L2 in the context of a WTWAX L1 Cache protocol for the L1 caches associated with the L2. The ability to reduce the store traffic by defining a hybrid WTWAX/WI L1 CACHE and identifying which lines in the cache should be managed WI in L1 is disclosed.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 42% of the total text.

Using a WTWAX/WI L1 Cache Protocol to Reduce Store Traffic

      One of the limitations in a MP configuration is the pin
limitation of the L2 and a need exists to reduce the bandwidth
requirements of the L2 in the context of a WTWAX L1 Cache protocol
for the L1 caches associated with the L2.  The ability to reduce the
store traffic by defining a hybrid WTWAX/WI L1 CACHE and identifying
which lines in the cache should be managed WI in L1 is disclosed.

      In examining the data rates across the interfaces that exist in
a memory hierarchy that is comprised of an L1/L2/L3, one can detect
that that at each level the aggregate data rate diminished.  The
point being that the support of the high data rate between L1/L2 may
not be essential.  The data rate between levels in the hierarchy can
be decomposed into event rates and the data size that is associated
with an event.  For example store activity can be on a DW basis for a
WT (Write Through) level or on a cache line basis for a WI (Write In)
level.  As we shall see, in some cases both the event rate and the
total traffic both decrease.  The inference that can be drawn from
this is that the same datum must cross a high traffic interface
multiple times between successive crossing of a lower traffic
interface.  The management of the interface can help reduce the
bandwidth requirement associated with the successively lower levels
of the hierarchy, the levels with higher numbers.

      Two aspects of the memory hierarchy give additional support to
this approach and they both concern the fact that the L1 Data Cache
is managed with a WTWAX protocol.

A WTWAX cache management protocol is defined as:

o   all stores are written through the L1 cache to the L2 (WT),

o   all lines that are stored into by the processors must we
    allocated (WA - WRITE ALLOCATE), and

o   all lines written into must be held exclusively (X).

      In such caches the DW store rate is .33 STORES/INSTRUCTION and
the aggregate store rate for 16 processors, attached to a single L2,
can easily exceed  2 DW-STORES/CYCLE.  In contrast, were stores from
a given processor buffered as is the case for a memory hierarchy
which is the L1 is WI, Write-In, the  L1 and the L2 level does
storing on a Cast-Out basis.  A CAST-OUT from a WI cache is said to
occur when a line that has been modified is chosen for replacement.
The aggregate store rate is diminished in the uniprocessor to a
cast-out every 4 cache misses.  The miss rate being determined by the
cache size.  Thus, with a L1-MISS every 25 instructions, and a
cast-out every 100 instructions, the DW STORE rate for 16 processors
attached to a single L2 could be 1 DW-STORES per cycle.

      WT Versus WTWAX a Trade-off - When a processor misses to a line
that is held with exclusive status by another processor, the timing
of the miss is extended by the following factors:

o   All outstanding STORES from the holding processor must be
    complete...