Browse Prior Art Database

Optimal Cache-to-Cache Transfer in Bus-Based Multiprocessor Systems

IP.com Disclosure Number: IPCOM000116269D
Original Publication Date: 1995-Aug-01
Included in the Prior Art Database: 2005-Mar-30
Document File: 2 page(s) / 93K

Publishing Venue

IBM

Related People

Cheong, H: AUTHOR [+2]

Abstract

In a shared memory multiprocessor system, there are multiple processor modules, memory, and system Input/Out (I/O) connected to the bus. A processor module consists of execution units (or collectively called a CPU), level-one (L1) cache(s), Bus Interface Unit (BIU), a and a level-two (L2) cache. When a CPU has an L1 cache miss, the memory request is sent to L2 cache. If it also misses L2, it is sent thru the BIU to the bus, the BIU of every unit connected to the bus `watches' any bus request every cycle.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 53% of the total text.

Optimal Cache-to-Cache Transfer in Bus-Based Multiprocessor Systems

      In a shared memory multiprocessor system, there are multiple
processor modules, memory, and system Input/Out (I/O) connected to
the bus.  A processor module consists of execution units (or
collectively called a CPU), level-one (L1) cache(s), Bus Interface
Unit (BIU), a and a level-two (L2) cache.  When a CPU has an L1 cache
miss, the memory request is sent to L2 cache.  If it also misses L2,
it is sent thru the BIU to the bus, the BIU of every unit connected
to the bus `watches' any bus request every cycle.

A request sent to the bus is either served by the cache of another
CPU or by the memory.  In a conventional bus-based MP system, a
request is served by a remote CPUs cache if the remote cache is
holding a modified copy of the line; if no cache holds the line in
modified state, the request is served by the memory.  Under this
condition, when BIU receives a request thru the bus, it has to
determine within the bus cycle if the request is served by the local
CPU, i.e., it has to determine if local L2 or L1 caches have a
modified copy within the bus cycle and it has to forbid the local CPU
from further modifying the line while BIU is serving the bus request.
This kind of handshaking between CPU storing data to L1 and BIU
locking L1 and L2 caches is essentially sacrificing local CPU
performance for multiprocessing on the bus.  For example, even if a
line is loaded to an L1 with the knowledge that no other cache is
holding the it, the CPU is not able to subsequently modify it without
obtaining a permission from its BIU.  The overhead of getting the
permission across units can be several cycles, and the collision
detection at the BIU is rather complicated.

      A scheme is disclosed to eliminate the above overhead and can
substantially improve the system performance.  Assume both the L1 and
L2 cache of a CPU are store-in caches.  L1 and L2 also implements the
inclusion property such that an L2 directory entry, which indicates
the residence and state of a line in L2 cache, also contains a field
to indicate if the line is also residing in the L1 cache.  Assume
also that L1 and L2 have the same line size.  The scheme is operated
in the following steps:
  1.  L1 reload: When the L1 generates a load miss and sends the
       request to L2, if the L1 will be the only L1 cache holding the
       data, L2 lets L1 load the line exclusive; for this scheme,
this
       means the CPU has full right to modify the line.  But if there
is
       a remote L1 currently holding the line, the line status in
local
       L1 is initialized to shared.  In the remote cache, if the
state
       of the line was not modified, it's set to shared; otherwise,
if
       it was inv...