Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Handling the L2-Pipeline Least-Recently-Used

IP.com Disclosure Number: IPCOM000106844D
Original Publication Date: 1993-Dec-01
Included in the Prior Art Database: 2005-Mar-21
Document File: 4 page(s) / 201K

Publishing Venue

IBM

Related People

Ignatowski, M: AUTHOR [+3]

Abstract

With regard to the accessing of an L2, it turns out that the contention point is really the "L2 pipeline" that all accesses pass through, in which the L2 directory is just the first stage. Other stages in this pipeline include accessing the storage keys, LRU bits, making various address compares (and being compared against), and so on. A novel way of managing LRU within an L2 that allows concurrent accesses is disclosed.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 28% of the total text.

Handling the L2-Pipeline Least-Recently-Used

      With regard to the accessing of an L2, it turns out that the
contention point is really the "L2 pipeline" that all accesses pass
through, in which the L2 directory is just the first stage.  Other
stages in this pipeline include accessing the storage keys, LRU bits,
making various address compares (and being compared against), and so
on.  A novel way of managing LRU within an L2 that allows concurrent
accesses is disclosed.

      In examining the data rates across the interfaces that exist in
a memory hierarchy that is comprised of an L1/L2/L3, one can detect
that that at each level the aggregate data rate diminished.  The
point being that the support of the high data rate between L1/L2  may
not be essential.  The data rate between levels in the hierarchy can
be decomposed into event rates and the data size that is associated
with an event.  For example store activity can be on a DW basis for a
WT (Write Through) level or on a cache line basis for a WI (Write In)
level.  As we shall see, in some cases both the event rate and the
total traffic both decrease.  The inference that can be drawn from
this is that the same datum must cross a high traffic interface
multiple times between successive crossing of a lower traffic
interface.  The management of the interface can help reduce the
bandwidth requirement associated with the successively lower levels
of the hierarchy, the levels with higher numbers.

      Two aspects of the memory hierarchy give additional support to
this approach and they both concern the fact that the L1 Data Cache
is managed with a WTWAX protocol.

A WTWAX cache management protocol is defined as:

o   all stores are written through the L1 cache to the L2 (WT),

o   all lines that are stored into by the processors must we
    allocated (WA - WRITE ALLOCATE), and

o   all lines written into must be held exclusively (X).

      In such caches the DW store rate is .33 STORES/INSTRUCTION and
the aggregate store rate for 16 processors, attached to a single L2,
can easily exceed  2 DW-STORES/CYCLE.  In contrast were stores from a
given processor buffered as is the case for a memory hierarchy which
is the L1 is WI, Write-In, the  L1 and the L2 level does storing on a
Cast-Out basis.  A CAST-OUT from a WI cache is said to occur when a
line that has been modified is chosen for replacement.  The aggregate
store rate is diminished in the uniprocessor to a cast-out every 4
cache misses.  The miss rate being determined by the cache size.
Thus with a L1-MISS every 25 instructions, and a cast-out every 100
instructions, the DW STORE rate for 16 processors attached to a
single L2 could be 1 DW-STORES per cycle.  It is reasonable to assert
that STORE activity is independent of the protocol used to manage the
memory hierarchy.  Thus a WI cache achieves a reduction in STORE
traffic to the L2 by consolidating the multiple STORES to an in-cache
line to a sin...