Browse Prior Art Database

Improved Memory Multi-bit Recovery Algorithm

IP.com Disclosure Number: IPCOM000121374D
Original Publication Date: 1991-Aug-01
Included in the Prior Art Database: 2005-Apr-03
Document File: 3 page(s) / 94K

Publishing Venue

IBM

Related People

Curran, BW: AUTHOR

Abstract

This article describes an improved error recovery algorithm for use in any memory or cache sub-system which employs error-correcting codes (ECCs). Typically, memory sub-systems incorporate check bits on data words to provide for single-bit error correction and double-bit error detection (SECDED). Prior art systems have implemented a complement- recomplement procedure which permits recovery of multiple-bit errors which are due to one or more hard cell faults. An improved procedure is disclosed which recovers multiple-bit error data caused by: - delta-I or signal cross-talk noise in the memory subsystem control logic, - transient noise in the power distribution network, - transient failures in the data read path, in addition - to hard cell faults.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Improved Memory Multi-bit Recovery Algorithm

      This article describes an improved error recovery
algorithm for use in any memory or cache sub-system which employs
error-correcting codes (ECCs).  Typically, memory sub-systems
incorporate check bits on data words to provide for single-bit error
correction and double-bit error detection (SECDED).  Prior art
systems have implemented a complement- recomplement procedure which
permits recovery of multiple-bit errors which are due to one or more
hard cell faults.  An improved procedure is disclosed which recovers
multiple-bit error data caused by:
-    delta-I or signal cross-talk noise in the memory
     subsystem control logic,
-    transient noise in the power distribution network,
-    transient failures in the data read path, in addition
-    to hard cell faults.

      The disclosed algorithm is shown in the drawing.  The hardware
which sequences the recovery procedure is typically part of the
memory controller.  This hardware remains idle until a multi-bit
error (UE) is detected on fetched data. The bad data and its address
are trapped, future requests to this memory bank are blocked to help
reduce noise within the memory sub-system.  If the memory subsystem
is servicing another storage request at the time the error is
detected, this operation is allowed to complete.  A model-dependent
amount of time is observed to permit power distribution network noise
to subside (note that this time may be zero).

      After this time period the data in error is refetched and, if
it is good, then the procedure is successful. Requests to the memory
bank are then enabled and the procedure is terminated.  If the data
contains a single bit in error (CE), then it is corrected and stored
back to the same locations.  Requests to the memory bank are then
enabled and the procedure terminates (successfully).  If the data
still contains a multi-bit error (UE), then one may r...