Browse Prior Art Database

Method for Decreasing the Effects of Soft Errors on a Complement/ Recomplement Scheme

IP.com Disclosure Number: IPCOM000107970D
Original Publication Date: 1992-Apr-01
Included in the Prior Art Database: 2005-Mar-22
Document File: 3 page(s) / 157K

Publishing Venue

IBM

Related People

Haselhorst, KH: AUTHOR [+4]

Abstract

An error correction scheme was initially defined and designed to implement a Single-bit Error Correct/Double-bit Error Detect (SEC/DED) 40/33 code. A complement-recomplement retry error correction method is used to correct double-HARD (two stuck bits) and HARD-SOFT (one stuck, one intermittent) errors. SOFT-SOFT and triple bit errors are considered to be machine check conditions. The likelihood of 2 soft errors lining up on the memory card was thought to be remote. Upon further examination it has been determined that many errors within a DRAM will look SOFT to the system error correction scheme. Therefore, the current error correction scheme cannot correct these errors. A method is required to eliminate some or all of the SOFT errors within the DRAMS.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 34% of the total text.

Method for Decreasing the Effects of Soft Errors on a Complement/ Recomplement Scheme

       An error correction scheme was initially defined and
designed to implement a Single-bit Error Correct/Double-bit Error
Detect (SEC/DED) 40/33 code.  A complement-recomplement retry error
correction method is used to correct double-HARD (two stuck bits) and
HARD-SOFT (one stuck, one intermittent) errors.  SOFT-SOFT and triple
bit errors are considered to be machine check conditions. The
likelihood of 2 soft errors lining up on the memory card was thought
to be remote.  Upon further examination it has been determined that
many errors within a DRAM will look SOFT to the system error
correction scheme.  Therefore, the current error correction scheme
cannot correct these errors. A method is required to eliminate some
or all of the SOFT errors within the DRAMS.  This invention describes
a design that can reduce the number of catastrophic SOFT-SOFT error
alignments within a SEC/DED correction scheme with
complement-recomplement retry.

      The SEC/DED error correction scheme requires that at least one
of the two bits in error be HARD (stuck at a particular voltage
level).  If both bits are SOFT (when data is rewritten, data read is
correct), then the current algorithm cannot determine which bits to
correct.  The complement-recomplement retry error correction scheme
works in the following manner:
      -  An even multi-bit error is detected
      -  The data is written back to the DRAMs inverted
      -  The inverted data is fetched
      -  The original data is compared to the fetched data to
determine which bits have not been inverted.  These are the bits in
error.

      By attempting to turn one of the bad SOFT errors into a HARD
error this invention will allow the complement-recomplement retry
scheme to correct double SOFT errors which would otherwise have
caused an error correction failure and resultant machine crash.

      The method by which SOFT errors are turned into a pseudo-HARD
error is performed by both the on-card memory control logic and
microcode.  The microcode can assist in 2 ways: 1) during IPL, or 2)
during runtime.

      During IPL, microcode will attempt to determine the number and
location of all errors within the DRAMs on the memory cards.  IPL
microcode will look for catastrophic errors (1/4, 1/2, or full chip
kills).  It has been determined that the most likely combination of
an uncorrectable error (SOFT-SOFT) is a catastrophic error lining up
with a single-cell error.  Thus, by turning the catastrophic error
into a HARD error the system can correct the data and not machine
check.

      When IPL microcode determines that a catastrophic error exists
it will write the syndrome encode for the bit in error into a "STICK"
bit register within the memory control chip.  The control logic will
attempt to fix an ECC error on a given I/O within an ECC word for
all addresses wi...