Browse Prior Art Database

Instruction Serialization Penalty Reduction

IP.com Disclosure Number: IPCOM000104912D
Original Publication Date: 1993-Jun-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 2 page(s) / 95K

Publishing Venue

IBM

Related People

Eberhard, R: AUTHOR [+2]

Abstract

A mechanism is described which reduces the performance penalty associated with serialization operations encountered during instruction processing in a two-level cache buffering, shared memory multiprocessor allowing self-modifying programs. The mechanism involves creation of a guarded instruction processing region using minor extensions to extant centralized storage consistency mechanisms and instruction checkpointing hardware.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Instruction Serialization Penalty Reduction

      A mechanism is described which reduces the performance penalty
associated with serialization operations encountered during
instruction processing in a two-level cache buffering, shared memory
multiprocessor allowing self-modifying programs.  The mechanism
involves creation of a guarded instruction processing region using
minor extensions to extant centralized storage consistency mechanisms
and instruction checkpointing hardware.

      Processor serialization requires the following steps to be
taken within the processor encountering the serialization request:

1.  Previously prefetched instructions following the one containing
    the serializing operation must be discarded.
2.  Instruction processing is suspended at the serializing operation
    until all storage activity requested by previously initiated
    instructions within the processor is completed.

      The storage hierarchy includes a split L1 cache structure,
instruction and operand.  The L1 operand cache is a store-through
design supported by a store queue whose structure is individually
replicated in the shared L2 cache storage for each processor
supported in the configuration.  The L2 caching policy is store-in.
Store requests can be held pending indefinitely with this structure,
within the confines of a finite store queue depth.

      Storage consistency is maintained through a combination of
mechanisms in the storage hierarchy, summarized as post-invalidation.
Within each processor, as stores occur to L1 operand cache, the
contents of L1 instruction cache are checked and invalidated, as
required.  The same is true of the prefetched instruction stream.  As
L1 instruction cache misses are serviced, their cache line addresses
are compared to those within the processor's L1 store queue.  Should
a match occur, a pending store conflict results in delaying the
request's transmission to L2 cache until the youngest comparing store
has completed to L2 cache.

      As fetches occur to L1 operand cache, their addresses are
compared to those within the L1 store queue.  If the fetch finds a
pending store conflict for another instruction's queued store
request, it stalls the processor until the youngest comparing store
has completed to L2 cache.  For fetches with L1 cache hit, the
comparison is to the doubleword boundary; for fetches with L1 cache
miss, it is to the L1 operand cache line boundary.

      L2 cache is responsible for maintaining the consistency of
storage between processors.  This is accomplished by forcing the L...