Browse Prior Art Database

Method for FRU Isolation in Intel platforms without using the SMI Handler and Possibly Having Data Corruption

IP.com Disclosure Number: IPCOM000028727D
Original Publication Date: 2004-May-27
Included in the Prior Art Database: 2004-May-27
Document File: 1 page(s) / 10K

Publishing Venue

IBM

Abstract

In previous Intel based designs, when the system detected an unrecoverable error, the SMI handler was invoked. Once invoked, the SMI handler code is executed by one of the system processors to determine the failing FRU. SMI handlers have three deficiencies. 1. While the SMI code is executing to determine the failing FRU, the corrupted data that kicked off the unrecoverable error could enter the main system memory and no longer be detected. 2. The SMI handler may not execute because the unrecoverable error occurred, for example, in the first bank of memory (where the SMI handler code is executed from). 3. As the systems become more complex, the SMI handler becomes more complex to code. In a muilti-node system, error recovery using the SMI handler is more problematic.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 69% of the total text.

Page 1 of 1

Method for FRU Isolation in Intel platforms without using the SMI Handler and Possibly Having Data Corruption

     The Enterprise X-architecture systems use a different approach for handling unrecoverable errors. Upon detection of an unrecoverable error, the system freezes code execution within 3 to 4 cycles and the FRU identification process is begun.

     By freezing so quickly, the corrupt data will not be allowed to propagate through the system like it would have if the system was using an SMI handler. Also, if the unrecoverable error occurred in the first bank of memory, the failing FRU would still be detected.

     Upon the system freezing, when an unrecoverable error occurred, the chipset will generate a SPINT signal (Service Processor Interrupt) indicating that an unrecoverable error has occurred. The integrated service processor will see this signal and scan out specific error registers out of the chipset and send the contents onto the Service Processor adapter. The adapter will store this data in its NVRAM and post a message in the system error log that an unrecoverable error has occurred and restart the system. On the next boot, POST/BIOS can interrogate the NVRAM of the service processor to identify the error condition and the failing FRU. Also, instead of POST/BIOS, IBM Director can look over the Service Processor NVRAM to determine the error condition and the failing FRU.

     The logged out data identifying the failing FRU is always preserved in the service processor...