Interleaved Latin Square Mapping for Use in Memory
Original Publication Date: 1996-Jan-01
Included in the Prior Art Database: 2005-Mar-31
Chappell, TI: AUTHOR [+2]
Disclosed is a technique for minimizing the otherwise large shift distance encountered in a typical latin square mapping, with preferred embodiment in a memory hierarchy, especially a cache.
Latin SquareMapping for Use in Memory
a technique for minimizing the otherwise large
shift distance encountered in a typical latin square mapping, with
preferred embodiment in a memory hierarchy, especially a cache.
In a memory
hierarchy where a large data path is desirable, it
is often necessary to use a latin square mapping to make use of the
full bandwidth potential of the chip organization (1). This is
especially so for a cache or related system which uses a
set-associative, late-select organization. In such a case, a normal
CPU access to the cache requires that S logical words be accessed
from the array, one from each of the S sets, where S is typically 4
(4 way set associative). Simultaneously with this access, the
address translation takes place in parallel. When completed, the
translation selects one of the S logical words already available out
of the array (late-select) and gates this to the CPU for a read, or
gates in a new word for a write.
Thus, one logical word is accessed from each of the S
sets for a
reload, which results from a cache access miss, if the
block to be cast-out (to make room for incoming block) has been
changed, then it first has to be stored back to the next level of
memory (e.g., from cache to main memory) for a Store-In cache, which
is typical. In addition, the new Block (line) must replace the one
which is cast-out. Both the store-back and reload operations all
occur on ONE block boundary. In other words, only one of the S sets
is involved in the reload process.
This is quite
different from the normal access above which
operates on all S sets. Thus, two very different kinds of addressing
and accessing must be done to the array. If it is desirable to do
this with very large bandwidth, and at high speed, then a chip with
simple accessing is often not adequate. One method to circumvent
these difficulties makes use of the so-called Latin Square mapping.
The problems associated with a set-associative, late-select cache
design are detailed in (1) which describes the latin square mapping,
its use, and the additional functions required to make it work. A
brief summary of some essential concepts will first be given.
general non-interleaved latin square map is shown in
Fig. 1(a) for a 4-way set associative mapping. The 4 sets, A, B, C,
and D are arranged along word lines, or rows, as shown. The
subscripts on the sets can represent many different quantities such
as blocks, Quad Words (QW = 16 bytes), Double Words (DW = 8 bytes),
Words (4 bytes), or Logical Words (LW = smallest unit typically
accessed by the CPU). For a late-select, set-associative cache, the
subscript and map organization is preferably equal to or greater than
a logical word, LW. The essential ideas are as follows. In Fig. 1,
it is assumed that the first block of set A consists of the entries
A0, A1, A2, and A3. ...