Autonomic On-Demand Adapter Dump
Original Publication Date: 2003-Jul-28
Included in the Prior Art Database: 2003-Jul-28
Disclosed is an improved data capture method to assist in the problem analysis and debug of firmware which resides within an input/output (I/O) adapter. When an event of interest occurs the firmware state is captured without disrupting adapter operation and copied into otherwise unused adapter memory. At a later time, the memory containing the copy of the firmware state is read by the host system and stored for further examination.
Autonomic On-Demand Adapter Dump
Disclosed is an improved data capture method to assist in the problem analysis and debug of firmware which resides within an input/output (I/O) adapter. Modern adapters contain a significant amount of complex firmware, and it is desired to examine the state of that firmware to help diagnose and fix microcode and hardware problems. A solution to this problem is to have the host system copy portions of the adapter memory into host memory for later analysis. This is sometimes referred to as "dumping" the adapter because the contents of the adapter memory are "dumped" into system memory, and can be a "smart dump" if the adapter is able to inform the host in some fashion what memory regions of the adapter to dump. There are two primary methods of dumping the adapter, both of which have significant drawbacks:
Attempt to dump concurrently with normal operations on the adapter, performing
multiple requests to fetch data from the adapter. The drawback is that this results in an inconsistent view of the firmware state because operations continue during the dump, and many firmware structures (such as linked lists, control blocks, etc.) will have an opportunity to change while the dump is in progress. This also has the additional drawback of making it difficult to determine if what appears to be an invalid state is actually the cause of the problem under investigation, or whether the invalid state is solely because portions of the firmware were retrieved from the adapter at different points in time. Reset the adapter and then dump its state. This is extremely disruptive to normal
operations. The resources provided by the adapter become unavailable, and the adapter will need to be restarted.
Typically in the past, adapter cards have only contained the minimum amount of memory required to support base functionality. This was because memory was expensive and memory granularities were small. Today, memory is less expensive and there may be significant amount of memory on the adapter card that is unneeded for the adapter card's base function. This might be caused by large memory granularities, for example, requiring 18 MB for base function and memory technology sizes forcing 32 or 64 MB to be installed on the card. It also might be caused by requiring a certain number of memory chips to be used to gain acceptable performance (i.e., to make the memory bus wider) and perhaps resulting in excess memory because of available memory chip sizes.
This extra memory space can be used to hold a "snapshot" of the current firmware state. This snapshot would be built by the adapter card itself by temporarily quiescing activity on the card, taking the snapshot, and then resuming normal activity. This results in a coherent dump of the firmware state and does not disrupt operations because the data can all be transferred within the card in minimal time. At some later point, the host system can retrieve the captured dump data while...