A method for better analysis of IPL time errors detected by a hypervisor.
Original Publication Date: 2005-Apr-21
Included in the Prior Art Database: 2005-Apr-21
Disclosed is a change in the way serviceable events can be handled during the early IPL of a server. The change makes it possible for better isolation of failed hardware.
A method for better analysis of IPL time errors detected by a hypervisor .
Many things are happening as a server goes through the stages of initial program load (IPL). As components in the server hypervisor start up and discover problems, such as resources that are not working properly, they log errors. These errors will be analyzed by Error Analysis in the hypervisor, and failing field replaceable units (FRUs) will be added to the error log. But during the IPL, the hypervisor also collects vital product data (VPD) for hardware and assigns location codes for hardware. The error analysis often starts before either the VPD is collected or the location code is assigned for a given FRU. In these cases, the data describing the FRU is of a lesser quality and quantity than it would be for runtime hardware failures. But if error analysis is delayed until the hypervisor IPL is complete, errors that may prevent the IPL's completion will never be seen. This invention corrects that, while preserving the ability to analyze and report errors that may prevent the hypervisor portion of the IPL to continue.
The main idea is to identify errors that will benefit from post IPL analysis and those that may prevent the IPL from continuing. A query can be made to determine when VPD collection and location code assignments are complete. Errors that can benefit from delayed analysis, like most bus errors, are requeued in the"'to be analyzed" queue if the components needed to properly anal...