Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

METHOD_TO_IDENTIFY_CPU FAILURE_OR_SYSTEM FAILURE_IN_A_SYSTEM

IP.com Disclosure Number: IPCOM000099010D
Original Publication Date: 2005-Mar-09
Included in the Prior Art Database: 2005-Mar-09
Document File: 3 page(s) / 26K

Publishing Venue

IBM

Abstract

Typically in computer systems such as servers, blades and personal computers, IERR from processor is monitored to check for CPU health. If the Service processor detects IERR asserted by a processor, it will declare the corresponding processor as bad. But the problem with this implication is that, IERR may be driven by CPU due to internal as well as external faults. The external events that could trigger IERR for example are IO device malfunction causing a bus hang or a bridge not responding to a IO cycle etc. This article addresses this issue and provides a method to differentiate between an real CPU fault condition and a system level problem. This method is applicable to Intel based processors only.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 60% of the total text.

Page 1 of 3

METHOD_TO_IDENTIFY_CPU FAILURE_OR_SYSTEM FAILURE_IN_A_SYSTEM

    This invention provides a means to verify if the malfunctioning component is certainly Intel Processor or if there is a system level problem affecting the processor to fail. The main advantage of following this method is to identify the failing subsystem correctly for efficient debug and product support. It has been observed the Service Processor wrongly identifies the processor to be bad when the failure is on some downstream device based on just IERR.

The current known solution is based on monitoring only IERR# signal and events leading to IERR assertion are ignored. This invention would account for all error signals namely BINIT#, MCERR# monitoring, prior to the assertion of IERR signal and makes an intelligent guess in identifying the failure.

    On the Intel processor based systems, there are three error signals namely BINIT#, MCERR# and IERR# that will be asserted by the processor based on the error conditions encountered.

This invention takes into account all of the above mentioned signals to decide if the failure is related to the Processor or if it is due to a malfunctioning downstream device creating a bus hang condition, resulting in the assertion of these signals. The signals on the Front side bus namely BINIT#, MCERR# and IERR# are monitored by external logic. BINIT# and MCERR# signal assertion in the processor(s) is enabled by BIOS. The processor(s) will assert these signals followed with IER...