Browse Prior Art Database

Using a common Error Correcting Special Purpose Register for correcting errors in a register file Disclosure Number: IPCOM000202463D
Publication Date: 2010-Dec-16
Document File: 2 page(s) / 25K

Publishing Venue

The Prior Art Database


A system and method for recovering from parity errors utilizing support in a limited instruction execution environment, such as may be supported on a service element utilizing RAM Mode operation and redefinition of Special Purpose Registers (SPRs) is disclosed.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 2

Using a common Error Correcting Special Purpose Register for correcting errors in a register file

Disclosed is a system and method for recovering from single bit/bit flip errors using common processor elements and redefinition of Special Purpose Registers (SPRs).

With the increasing rate of soft errors in SRAMS, register files and even latches, it has become necessary to provide a means to recover from bit flips. To do this, the hardware must detect the error, stop the processor core before the error corrupts any architected state and then restart execution from the last know good architected state.

Instruction Retry Recovery (IRR) is utilized to recover from bit flips. The fundamental concept of IRR is to maintain an architectural checkpoint on hardware instruction boundaries, which can be restored in the event of an error so that processing can be resumed (retried) from the last instruction checkpoint. Instruction check pointing (ICP) has dependencies on the logic throughout the processor.

ICP requires the following:

A means of preserving the entire architected state of the processor in a hardened checkpoint.

A means of protecting the integrity of the checkpoint with robust error detection throughout the processor.

A means of resetting non-check pointed logic to attempt to remove the error.

A means of restoring the checkpoint.

The fixed point and floating point data is stored in a memory element (register file) protected by ECC. The Fixed Point Unit (FXU) and the Floating Point (FPU) Unit may detect bit flip errors while operating on data. Since there are strict timing constraints in executing instructions, the normal data flow and timing does not support running an Error Correction Code (ECC) algorithm in-line with a fixed- or floating-point computation to correct for bit flip errors. Instead the correction is performed offline during the ICP process using shared ECC correction hardware. This approach uses existing dataflows and enables ECC correction without impacting normal timing and attains efficiencies by sharing hardware resources among units.

Typically, ECC support may already be applied and supported in the Load/Store Unit (LSU) where data is read from memory. If...