Browse Prior Art Database

Main Storage Error Mapping Mechanism

IP.com Disclosure Number: IPCOM000043231D
Original Publication Date: 1984-Aug-01
Included in the Prior Art Database: 2005-Feb-04
Document File: 2 page(s) / 37K

Publishing Venue

IBM

Related People

Flusche, FO: AUTHOR [+2]

Abstract

A method is disclosed to provide an aid for faulty element identification when main store failure occurs by recording correctable and uncorrectable error occurrences. In a typical storage system, repair of hardware which is creating intermittent uncorrectable errors (UEs) requires reproduction of at least one of the correctable errors (CEs) by a diagnostic program. Experience has shown that a diagnostic program is not always able to reproduce the failure. Procedures are used which relocate array cards by dispersing them across different error boundaries. This results in multiple errors before a failing array card/component can be found or verified. Another solution has been to replace entire groups of storage cards, which is a very costly form of repair.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 61% of the total text.

Page 1 of 2

Main Storage Error Mapping Mechanism

A method is disclosed to provide an aid for faulty element identification when main store failure occurs by recording correctable and uncorrectable error occurrences. In a typical storage system, repair of hardware which is creating intermittent uncorrectable errors (UEs) requires reproduction of at least one of the correctable errors (CEs) by a diagnostic program. Experience has shown that a diagnostic program is not always able to reproduce the failure. Procedures are used which relocate array cards by dispersing them across different error boundaries. This results in multiple errors before a failing array card/component can be found or verified. Another solution has been to replace entire groups of storage cards, which is a very costly form of repair. A small amount of hardware can be added to main memory configurations which tracks all data bits and adequate address information for both CE and UE errors so that correlation can take place. This error history is collected during normal system operation. The error history can also be examined during normal system operation, and the data on errors correlated by observation or by a utility program run in the processor or a separate service system processor. The data collected provides for determination to a failing card and its array component, or indicates the failure may be peripheral (support logic, cables, etc.). The figure shows a two- dimensional array which records and t...