Multiple Bypass Mechanism
Original Publication Date: 1991-May-01
Included in the Prior Art Database: 2005-Apr-02
Publishing Venue
IBM
Related People
Abstract
Disclosed is a technique for extending conventional bypassing mechanisms for cache miss processing so that multiple units can be bypassed. The benefit is to improve memory access by CPU and reduce cache traffic.
Multiple Bypass Mechanism
Disclosed is
a technique for extending conventional
bypassing mechanisms for cache miss processing so that multiple units
can be bypassed. The benefit is to
improve memory access by CPU and
reduce cache traffic.
Bypassing is
a classical technique for processor cache miss
handling. When a cache miss occurs for a
fetch request by an
I/E-unit of processor the requesting unit will wait for the data unit
to return from lower level memory hierarchies (e.g., second level
cache or main storage). The size of a
cache line (e.g., 64-128
bytes) is typically a multiple of the granule (e.g., doubleword) for
I/E-unit access. When the missed line is
transferred back (to cache
unit), it is typical that the data unit causing the miss be
transferred back first and the rest of the data units are transferred
successively in sequential (wrapped to beginning of line past last
unit) manner. When the first data unit
is back, it is also bypassed
to the requesting I/E-unit so that it can resume its operations
without having to reinitiate the fetch access to the cache.
Although the
conventional bypassing technique has been found
useful, it does not provide similar benefit for accesses to a
currently missed line past the one causing the miss. Due to
limitations on bus bandwidth it takes multiple (e.g., 4-16) cycles to
have the missed line transferred back.
Similarly, it may take
multiple cache cycles to have the missed line put away into cache
arrays. The trailing-edge penalties
caused by accesses to the missed
line after bypassing (of first unit) can be significant if such
accesses can only be satisfied when the miss processing is complete.
Design complexity on cache unit results if cache accessing to ongoing
missed line needs to be supported.
The concerned
trailing-edge problem is primarily due to program
spatial locality. That is, data units
physically close (or adjacent)
to each other tend to be accessed closely in time. Sequentiality is
a particularly strong characteristic. A
possible approach to the
problem is to extend the conventional bypass facilities for
subsequent data accesses to a missed line.
The figure
depicts a simple processor organization. The
I/E-units (including the instruction and execution units) are
enhanced with a Bypass Buffer (BP). For
simplicity of illustration
we assume that both the I/E-unit access to cache and memory fetch
(for cache miss) are at doubleword bandwidth (i.e., 8 bytes per
cycle). BP may then be structured as a
FIFO stack of N (e.g., N=4)
entries. Each entry of BP carries a doubleword, plus other necessary
information. Upon cache miss processing,
BP will hold certain
doublewords bypassed from the memory fetch.
We also assume proper
mechanisms for I/E-unit to identify the doublewords (e.g., addresses)
buffered at BP. A simpler design of
multiple bypass is as follows:
1. When a cache miss occurs, the cache
unit issu...