Browse Prior Art Database

Error Injection Simulation in Logout Analysis of Central Processor Hardware Failures

IP.com Disclosure Number: IPCOM000079964D
Original Publication Date: 1973-Oct-01
Included in the Prior Art Database: 2005-Feb-26
Document File: 3 page(s) / 61K

Publishing Venue

IBM

Related People

Berzins, V: AUTHOR [+2]

Abstract

Logout Analysis is a program designed to facilitate rapid repair of CPU failures. This is accomplished by programmed analysis of machine data (logout data) recorded at the time of failure on a data set, by the Recovery Management Support (RMS) or System Error Recorder (SER) programs. Logout Analysis is written to run as a problem program and may, therefore, be run concurrently with customer operations. The output from Logout Analysis attempts to relate the failure to a minimum set of Field Replacement Units (FRU's).

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 3

Error Injection Simulation in Logout Analysis of Central Processor Hardware Failures

Logout Analysis is a program designed to facilitate rapid repair of CPU failures. This is accomplished by programmed analysis of machine data (logout data) recorded at the time of failure on a data set, by the Recovery Management Support (RMS) or System Error Recorder (SER) programs. Logout Analysis is written to run as a problem program and may, therefore, be run concurrently with customer operations. The output from Logout Analysis attempts to relate the failure to a minimum set of Field Replacement Units (FRU's).

Upon occurrence of a machine check error, a logout record consisting of machine status (at the time of error) is saved for later analysis. On IBM System 360/195, where the analysis method was proven, this logout data record is quite extensive and reflects most of the machine status necessary to reconstruct operations underway when the error occurred. The machine data is made up of error indicators, the state of most control triggers and control registers, as well as contents of most data registers. The very existence of this detailed machine status makes it possible to employ several techniques to locate the potential source of the failure.

These methods include parity generation of recorded fields for comparison against recorded parity, trigger tracing through known control states and most importantly, software simulation of the hardware. Software simulation is particularly effective in analysis of arithmetic operations. Method:

The method of failure analysis described here performs software simulation of a complete Floating-Point Multiply operation functionally and in terms of the hardware, as opposed to using a programmed logic simulator. The latter would be too detailed and costly for this application. In this case, the primary interest is in isolating the failure to a FRU rather than to a discrete circuit on the FRU.

At first, the analysis program must determine that valid input data to the failing Multiply does exist in the logout field. The valid inputs are used to generate the correct product via functional simulation as if no hardware error had occurred. This simulated result is compared against the logged, bad result to determine the location of the first failing fraction bit within a 71-bit wide intermediate result field and/or 72-bit wide final r...