Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Cache Invalidate Filtering with Residence Approximation

IP.com Disclosure Number: IPCOM000105304D
Original Publication Date: 1993-Jul-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 4 page(s) / 155K

Publishing Venue

IBM

Related People

Liu, L: AUTHOR

Abstract

In Multiprocessor (MP) cache design, one major operation for cache coherence control is the invalidation of cache lines. Consider the IBM* 308l system design. When a CPU stores into a cache line it must acquire EX status from the storage control element (SCE) first. Upon granting EX status to the CPU it sends out invalidate signals to those remote caches that may contain copies of the line. In 3081 systems the SCE maintains directory copies of the CPU caches and detects the status of a line in all caches upon a cross-invalidate operation. In an MP system with snoopy cache design a common bus is used to connect all CPUs. When a CPU stores into or acquires EX status on a line, the invalidate signal is broadcasted to all CPUs on the common bus, and each CPU cache control will perform invalidation when necessary.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 37% of the total text.

Cache Invalidate Filtering with Residence Approximation

      In Multiprocessor (MP) cache design, one major operation for
cache coherence control is the invalidation of cache lines.  Consider
the IBM* 308l system design.  When a CPU stores into a cache line it
must acquire EX status from the storage control element (SCE) first.
Upon granting EX status to the CPU it sends out invalidate signals to
those remote caches that may contain copies of the line.  In 3081
systems the SCE maintains directory copies of the CPU caches and
detects the status of a line in all caches upon a cross-invalidate
operation.  In an MP system with snoopy cache design a common bus is
used to connect all CPUs.  When a CPU stores into or acquires EX
status on a line, the invalidate signal is broadcasted to all CPUs on
the common bus, and each CPU cache control will perform invalidation
when necessary.  Although such broadcast scheme avoids the complexity
at SCE, more burden is pushed to individual CPU's to check for
invalidation conditions.  As a result, snoopy cache designs often
implements a separate cache directory copy at each CPU for the
purpose of monitoring bus signals.  In any case, traditionally cache
invalidation is carried out by matching addresses in directories.  In
this design we propose an alternative approach to cache invalidate
monitoring by approximating cache line residence at caches with
simpler bit-vectors.

      Caches are normally implemented as a 2-dimensional table.  The
columns are called congruence classes.  The number of rows is called
the set-associativity.  For a given access address, certain middle
address bits are used to index to a congruence class, and a directory
search will match the address within the congruence class for a hit
or miss.  Cache entries within each individual congruence class are
managed with a Least-Recently-Used (LRU) type replacement algorithm.
For the illustra tion of the design we assume a cache with 256
congruence classes, 4-way set-associativity, and 128 bytes per line.
For a given absolute address (bits 0-31) for a cache access, bits
17-24 are used to select the associated congruence class.  Bits 25-31
is the offset within the line.  Each cache directory entry records
address bit 0-16.  We will consider a residence approximation
bit-vector R with 2 sup k x 256 bits, where k is a positive integer.
For a given absolute address A, we will use A(x,y) to indicate the
integer represented by the address bits from x to y position in A,
where 0 <= x <= y31.  Semantically, the ith bit of R, where <= i <= (2
sup k x 256) (denoted R [i]) is 1 only when there is a line resident
in cache with address A such that i equals A(17 - k, 24).  That is,
the bits of R are turned on according to the lower order k + 8
address bits in the resident cache lines.  Note that a bit in R may
sometimes be 1 covering more than 1 resident cache lines.  A cache
line with absolute address A will not possibly be resid...