Browse Prior Art Database

Cache Error/Program Loop Protection

IP.com Disclosure Number: IPCOM000120622D
Original Publication Date: 1991-May-01
Included in the Prior Art Database: 2005-Apr-02
Document File: 3 page(s) / 102K

Publishing Venue

IBM

Related People

Treu, AR: AUTHOR

Abstract

This article describes a technique for use in a computer system which provides a higher available system by disabling the cache in the supervisor program check interrupt routine to avoid a "hung" loop for cache systems with no parity checks.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Cache Error/Program Loop Protection

      This article describes a technique for use in a computer
system which provides a higher available system by disabling the
cache in the supervisor program check interrupt routine to avoid a
"hung" loop for cache systems with no parity checks.

      Conventionally, if a cache error has occurred, the non-maskable
interrupt (NMI), will use the system's communications area (non-
volatile (NV) RAM logout area) to indicate that the system had a
cache error and that the cache was disabled and that the system is
running in degraded mode. The operating system (OS) can now act on
the information.  It may retry by testing cache and then re-enabling
to verify it was not an intermittent error, or run degraded and
schedule maintenance at a later time. Further, it may send a message
to the user on what action the OS is taking or scheduling of field
replaceable unit (FRU) replacement.

      The NMI handler will analyze the cache error to determine what
to disable and build the error report in the systems communication
area (NVRAM logout area).  The microcode has to work hand and hand
with the hardware, since the loop protection bit will reside in a
hardware location, and the detection of cache errors will depend on
hardware error capability along with the loop protection code.  If
hardware has cache error capability, then the hardware will
automatically disable the cache on error, but if the loop protection
detects a loop, the cache will still be suspect until disabled, since
not all cache errors can be detected via error logic.   Cache errors
cannot be tied to operands, since it is tied to all of memory.  Only
via critical interrupts can cache be suspect.  If there is a cache
error, a machine check (NMI) would normally be taken to a location
proceeding to start to fetch the NMI interrupt handler. This may load
the cache again, and it may fail again and keep looping.

      This is a reliability, availability, serviceability (RAS)
disclosure dealing with high availability and a degree of fault
tolerance to the end user.

 ...