Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Main Memory Soft Error Correction

IP.com Disclosure Number: IPCOM000050825D
Original Publication Date: 1982-Dec-01
Included in the Prior Art Database: 2005-Feb-10
Document File: 3 page(s) / 47K

Publishing Venue

IBM

Related People

Gerchman, ET: AUTHOR [+2]

Abstract

This arrangement is applicable to any system that utilizes a single error correction, double error detection Hamming code in main storage. This technique is designed to handle a single hard failure and a single soft failure within 8 bytes (64 Bits) of data. Other combinations, i.e. .2 hard failures, 2 soft failures, and more than 2 failures, are not correctable.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

Main Memory Soft Error Correction

This arrangement is applicable to any system that utilizes a single error correction, double error detection Hamming code in main storage. This technique is designed to handle a single hard failure and a single soft failure within 8 bytes (64 Bits) of data. Other combinations, i.e. .2 hard failures, 2 soft failures, and more than 2 failures, are not correctable.

There are several ways of implementing soft error correction, and this scheme was chosen because it requires a minimal amount of hardware.

One present error checking and correcting (ECC) function comprises a total of 8 common usage chips.

Fig. 1 is a logic diagram for 1 data byte of the total ECC design. That part of the diagram within the heavy lines is the added circuitry (Raw Check Bit Latch) necessary to accomplish Hard/Soft Correction. This diagram can be referred to for the following explanation.

The correction scheme requires a double bit error (DBE) handling micro- routine involving the service processor used in the system working in conjunction with the instruction processing unit (IPU). The following description, considered with relation to Figs. 2A and 2B, summarizes the chain of events which will occur within this routine.

A DBE results in a machine check which will freeze the 64 databits coming from memory in the error check register (ECR). The 8 checkbits for this data will be frozen in the 8 latches added to the hardware. The Service Processor (SP) will load the ECR, the checkbit latches, and the failing address of the data into the IPU local store array. The SP will then scan '00000000' and '11111111' into the 8 test ECC and 8 inhibit ECC latches, respectively. This will invoke into the ECC Logic, a diagnostic function which will allow parity bits to be stored in main memory check bit cells. Control of the system is then returned to the IPU.

The IPU sets CCER (Cache Control Extension Register) bit 5 to disable the normal machine check caused by a double bit error. The IPU will then proceed to store a pattern of '10000000' for each of the 8 bytes into the failing address. These patterns will produce parity bits of '0' which will be stored in the check bit cells in memory.

Next, the Service Processor will scan '11111111' into the 8 test ECC and 8 inhibit ECC latches. This will invoke another diagnostic function which will allow checkbits coming back from memory to be latched into the bit 0 positions of the MDR (Memory Data Register). The IPU will then read the data, causing the above action to occur. This data will be used to determine if any of the checkbit cells at the failing address are stuck at '111'.

The SP will again scan '00000000' and...