Browse Prior Art Database

Server Controls Delayed Reset Host Adapters

IP.com Disclosure Number: IPCOM000032610D
Original Publication Date: 2004-Nov-09
Included in the Prior Art Database: 2004-Nov-09
Document File: 2 page(s) / 29K

Publishing Venue

IBM

Abstract

A program is disclosed to enhance the error recovery procedure of a failed adapter. The invention involves the server microcode and the adapter microcode. During the error recovery procedure, if a server decides to reset an adapter, it notifies the specific adapter ahead of time (Delayed Reset). This allows the adapter to gracefully shut off all input/output activities as and to allow the adapter to gather any important debugging information, and to prepare itself to be reset later by the server. The server data then will be safe, and the adapter data will be preserved for further analysis.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 2

Server Controls Delayed Reset Host Adapters

In previous error recoveries, whenever an error is detected by either a server or by its connected host adapters, the server drives the error recovery procedure that contains messages passing between the server and the host adapters. Both the server and host adapters will collect their important data and reinitialize hardware and software as instructed through the hand shaking messages. During this recovery procedure, if the server detects another error from an adapter, the server will then signal another error recovery, which may lead to a recursive recovery action, which ends up crashing the whole box.

There are three major drawbacks in the previous error handling design. The first one is that at the end of the recovery process the server terminates any handshaking communication with all failed adapters, which eventually leads to the resetting of these adapters at the end of the recovery process. This eliminates the recursive error handling, prevents the system from crashing, and protects the server data. But, the drawback is that the data from the client side is not protected since the bad adapter continues to write bogus data to the client side. The second drawback is when the server blocks the communication paths from the error adapters, the adapters are blindly running on their own and the hang timer error occurs as they cannot master the bus. The adapters end up in a dead loop. The third major drawback is that the data collected from the adapter side at this time is modified and it is not the correct data to analyze the problem.

A current design provides a solution to the drawbacks mentioned earlier. Whenever, the server decides that the adapter is not in a healthy functional state (meaning that the adapter may affect the server and the client side operations), the server performs a delayed reset recovery actions. In the delayed reset process, the se...