Browse Prior Art Database

Maintenance Strategy for Computer Storage Systems With ECC

IP.com Disclosure Number: IPCOM000042152D
Original Publication Date: 1984-Mar-01
Included in the Prior Art Database: 2005-Feb-03
Document File: 3 page(s) / 73K

Publishing Venue

IBM

Related People

Bannon, RD: AUTHOR [+6]

Abstract

Service strategy is described that optimizes the service cost and performance required on storage systems with Error Checking/Correction (ECC). Definitions Soft Error (Single Bit Correctable Error). This error can be caused by an ALPHA particle hit or some other reason; however, it can be corrected by writing to the failing memory location. Permanent Error (Single Bit Correctable Error). This error can be caused by a permanent failure in the memory. However, it can be corrected in the data output with ECC (not corrected in the memory). SEC-DED. Single error correct - double error detect ECC method. This is an error correcting code that allows correction of single bit errors, and detection of all double bit errors and some multiple bit errors. Uncorrectable Error.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 3

Maintenance Strategy for Computer Storage Systems With ECC

Service strategy is described that optimizes the service cost and performance required on storage systems with Error Checking/Correction (ECC). Definitions Soft Error (Single Bit Correctable Error). This error can be caused by an ALPHA particle hit or some other reason; however, it can be corrected by writing to the failing memory location. Permanent Error (Single Bit Correctable Error). This error can be caused by a permanent failure in the memory. However, it can be corrected in the data output with ECC (not corrected in the memory). SEC-DED. Single error correct - double error detect ECC method. This is an error correcting code that allows correction of single bit errors, and detection of all double bit errors and some multiple bit errors. Uncorrectable Error. This error is comprised of two or more bit errors (any combination of soft or permanent errors) that cannot be corrected (may even not be detected) and need a service call prior to proceeding any further. Conventional Method There are a number of conventional methods in which memory systems with ECC are maintained. One of the conventional methods (Fig. 1) is to detect the single bit error, with ECC, correct it, write the corrected data back in storage, and read it back to determine if the error still exists. If the error does not exist, then it was a soft error, and if the error still exists, then it is a permanent error. This initiates a service call to fix the permanent memory error. This scheme has an advantage that the memory content is always correct. The disadvantages are performance degradation and spare parts cost. Performance is degraded because of the additional memory cycles to verify the soft error. Spare parts costs are high because the capability of ECC logic has not been utilized. Proposed Method The method outlined below and illustrated in Fig. 2 allows (a) on-line diagnostics (i.e., increased availability) and (b) reduced maintenance costs. These goals are achieved as follows: Continue to operate the memory with one failing bit (soft or permanent) until the customer feels that the system performance is degraded. This approach usually results in reduced spare parts cost and allows for scheduled servicing. As long as the memory contains only one failing bit, there will be no performance impact. Even the performance impact of two bi...