Browse Prior Art Database

Method to Recover From an Uncorrectable Error Based on Correctable Error Log Data and Scrub Data

IP.com Disclosure Number: IPCOM000018581D
Original Publication Date: 2003-Jul-24
Included in the Prior Art Database: 2003-Jul-24
Document File: 1 page(s) / 43K

Publishing Venue

IBM

Abstract

Bad data bits identified during error correction code (ECC) correction of runtime memory access fails or during background scrub (soft error cleanup) can be used to correct a fail rather than just ending system processing. The identified bad bit(s) can also be used to improve field replaceable unit (FRU) callouts. This process/device can be used to correct fails on any interface where failing data bits can be identified and logged.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 1

  Method to Recover From an Uncorrectable Error Based on Correctable Error Log Data and Scrub Data

   Disclosed is a process/device where uncorrectable errors (UE) can be corrected based on previous correctable errors (CE) identified using scrub or identified during correction of runtime fails and logged are then used to generate a best guess for correcting the fail in a system using error correction code (ECC). If the failing access contains at least one bit that has previously failed due to a CE (there is an entry in the CE log or a scrub fail), a routine is run (implemented in software or hardware) where the previous CE bit(s) that failed is(are) flipped, one at a time or in groups if more than one bit has been identified, and ECC is then checked to see if the fail is correctable. If the fail is corrected, then processing continues. If it is not corrected, an error would be posted since this would indicate that there are too many failing bits.

     The attempt to correct the error would only be executed when a known failing bit had been identified by scrub or identified during correction of runtime fails and logged. This is done to prevent the possibility of injecting a third error and having that corrupt the data and cause the error to appear correctable.

     Correctable error (CE) error logging and scrub fail data can be used to improve field replaceable unit (FRU) callouts. Currently, uncorrectable errors (UE) requires the replacement of an entire group of DIMMs or memory cards (usually two or four). The CE error logging would be used to isolate to the faili...