Browse Prior Art Database

Early Cache Miss Issuing Past Observation-Free Stores

IP.com Disclosure Number: IPCOM000105025D
Original Publication Date: 1993-Jun-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 2 page(s) / 99K

Publishing Venue

IBM

Related People

Liu, L: AUTHOR

Abstract

Disclosed are techniques for fast cache miss fetching prior to sending observation-free stores (from processor) to L2/L3. A store is observation-free if its contents have not been fetched from by the processor. Such miss-fetched line can be fetched by the processor without data coherence problems.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Early Cache Miss Issuing Past Observation-Free Stores

      Disclosed are techniques for fast cache miss fetching prior to
sending observation-free stores (from processor) to L2/L3.  A store
is observation-free if its contents have not been fetched from by the
processor.  Such miss-fetched line can be fetched by the processor
without data coherence problems.

      Cache coherence is a critical issue in multiprocessor designs.
A typical problem (as described in IBM/370 architecture document) is
the following example.  Consider 2 processors P1 and P2 does memory
accessing:  P1 does <Store A>-<Fetch A>-<Fetch B>, and P2 does <Store
B>-<Fetch B>-<Fetch A>.  It cannot happen that both last Fetches on
P1 and P2 do not observe the other processors' Stores.  In
conventional machines like 3033 (store-thru) each processor has a
store stack holding those processor issued stores that are to be
issued to shared main storage.  A store stack entry is removed when
it has received "synchronized" signal from SCU (Storage Control
Unit).  Any processor operand fetch can be carried out from the cache
only when "operand-store compare (OSC)" cannot find a conflict entry
in the store stack.  Such delay of operand fetch due to OSC conficts
contributes to a few % of processor MIPS.  Although some designs use
EX bits so that synchronization among processors on a store becomes
trivial when the processor holds EX status of the line, the OSC
problem still exists.  Without OSC mechanism the described architure
violation case could occur when cache misses are issued while there
are still preceding stores not yet released to storage controller
(L2/L3).  In this invention I propose a mechanism to achieve the
following:

1.  Stores not issued to shared memory (L2/L3) may be fetched by
    subsequent operands.
2.  Most cache misses may be issued (e.g., to L2) without waiting for
    the issuing of all prior stores to L2.  The received line may be
    accessed by the processor without architecture ambiguity.

The central idea is to allow misses issued past stores as long as the
stores (in store stack) have not been observed (operand-fetched
from).

      For illustrations, consider a 3033 type store-thru cache design
with modifications.  Each processor has a store stack STRSTK with a
fixed number of entries.  The CPU itself has a simple pipeline doing
operand accesses in sequence.  All CPU fetches are from 1st level
cache (L1), but a store does not have to require the line in L1.  A
2nd level cache L2 is shared by all processors and is managed by the
SCE.  Each store from a processor is pushed on its STRSTK first.
Stores in a STRSTK are released to L2 in-sequence.  The SCE receives
stores (in its own store stacks) from processors and sequence them.

The putaway of stores to L2 arrays will be according to the sequence.
Each time SCE sequences a store it se...