Browse Prior Art Database

Nonvolatile Write Cache Error Correction

IP.com Disclosure Number: IPCOM000115322D
Original Publication Date: 1995-Apr-01
Included in the Prior Art Database: 2005-Mar-30
Document File: 8 page(s) / 248K

Publishing Venue

IBM

Related People

Lemaire, CA: AUTHOR [+3]

Abstract

Described is a faster, low-cost method of error correction in a Direct Access Storage Device (DASD).

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 38% of the total text.

Nonvolatile Write Cache Error Correction

      Described is a faster, low-cost method of error correction in a
Direct Access Storage Device (DASD).

      Traditionally, the data storage in a DASD subsystem has not
used Error Correction Code (ECC) function since it would have slowed
Direct Memory Access (DMA) storage access times too much and cost too
much.  Instead, byte-wise parity was provided sometimes to detect
(but not correct) single-bit errors to prevent propagation of these
into the system.  Additionally, Length Run Check Codes (LRCC) were
sometimes used to enhance the error detection capabilities by
accumulating an XOR of each word in each sector as they successively
passed through the subsystem.

      This invention provides several error correction mechanisms by
modifying the parity generation function, providing access to the
LRCC and parity, and providing a mechanism for combining these to
correct any single-bit error in a sector as well as correcting for
any memory chip "package kill".  ECC functions are provided in many
system storage systems.  Parity is used to detect errors, but cannot
alone be used to correct errors.  This solution provides error
correction for no more cost than parity and LRCC already provided on
these subsystems.  This solution is faster for non-error situations,
and costs less than ECC.

      The parity is generated across all chips with no more than one
bit per chip participating in the generation or checking of the
parity function.  The parity is stored in a separate chip.  For
instance, 8 four-bit chips would use 4 eight-way XOR functions with
the resultant four XOR bits stored in a ninth 4-bit memory chip.
Alternatively, 4 eight-bit chips would use 8 four-way XOR functions
with the resultant four XOR bits stored in a fifth 8-bit memory chip.
In this type of system, once the processor discovered a single bit
failure (or a chip kill) via the parity-check hardware, error
recovery would XOR the remaining good chips to reconstruct the data
that should have been on the bad chip.  To do this, the system must
determine which chip was bad.  With parity alone, this is impossible
for the general case.  However, with LRCC, the job becomes possible.

      The Length Run Checking Code (LRCC), as we use it for this
invention is well known and has been used before in DASD subsystems.
It XORs each word as it comes into the subsystem starting again with
the starting word of each sector.  The width of a "word" can be 1, 2,
4, or 8 bytes, depending on the needs of the subsystem.  This width
is not important to the concept of this invention.  The LRCC is
stored with the data (sequentially after each sectors data) in the
buffer storage on the card.

Corrupted Data(z is 0 which should have been 1, x is 1 instead of 0)
Syndrome says chip 0 is bad, parity says data word 7 bad.

      An additional offset is XORed into the LRCC sometime during the
XOR accumulation (usually at the begin...