Method for implementing hardware workarounds or init improvements on resources persistently deconfigured from system.
Original Publication Date: 2010-Jan-28
Included in the Prior Art Database: 2010-Jan-28
When a system removes resources that have errors on it or reduce system performance, the action of recovery is not always replacement of hardware. There are times when a hardware procedure in firmware or software level can bring the resource back up to fully functional and or full performance. The method is to detect the new firmware or software being put on the system and allow resources to be brought back without replacement.
Prior art , describes a condition where a resource is found to be unusable and is persistently removed from usage making the computer system more reliable. With the more complicated system designs for computers, the chips can be configured in many ways by procedures and initialization data. When errors happen in the system like thresholding on recoverable errors, or upon an error resetting the chip or sub-chip resource, the condition would persistently deconfigure the resource to prevent the system from reaching an unstable or unusable condition. Current implementation means that part is serviced out with a new part. When the service action is done, the part designator is changed; for example, the serial number. The serial number change triggers the persistent deconfiguration records to be removed.
Analysis is done whether by First Found Data Capture (FFDC) or by Failure Analysis on the part, and it is determined that a firmware or software patch of a procedure addition or change, or initialization changes can bring the resource back into usable and reliable state.
The problem is, if new firmware is placed onto the system, the persistent deconfiguration actions do not change. The resource that was persistently deconfigured would be removed or deconfigured from system usage even with new firmware. There is the possibility to have a procedure to manually have a person remove records from the system. This procedure is problematic linking the resource to the persistent deconfiguration records and could lead to human-initiated mistakes of removing too many records or not enough.
The solution is to indicate the new procedure or init change in the new firmware code level like in a Firmware Update Information (FUI) list. The new code would have a FUI list with data to indicate change rules for a specific Change, System, FRU and Error. The firmware with this FUI list would then check the system for persistently deconfigured resources. When firmware finds a resource that matches the FUI list and has a persistently deconfigured record, then the firmware can remove the persistent deconfiguration record. The system would then be allowed to IPL up with resource configured and then go through diagnostics to confirm resources' quality and reliability. If the new procedure or...