Browse Prior Art Database

Method for avoiding interface errors during controller recovery

IP.com Disclosure Number: IPCOM000015990D
Original Publication Date: 2002-Sep-01
Included in the Prior Art Database: 2003-Jun-21
Document File: 2 page(s) / 43K

Publishing Venue

IBM

Abstract

When a storage controller in a network performs internal recovery, there are cases where I/O (Input/Output) operations from hosts are purged. Disclosed is an algorithm that is designed to reduce or eliminate unnecessary error recovery and error reports by systems in a network when a storage controller device in that network is performing internal recovery.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 2

Method for avoiding interface errors during controller recovery

    When a storage controller in a network performs internal recovery, there are cases where I/O (Input/Output) operations from hosts are purged. Disclosed is an algorithm that is designed to reduce or eliminate unnecessary error recovery and error reports by systems in a network when a storage controller device in that network is performing internal recovery.

In current implementations, operations which are active on the I/O interface are terminated using link level facilities such as connection recovery on *ESCON interfaces and ABTS (Abort Sequence) on **FICON interfaces. These interface recoveries cause messages to operators when logged by the host system, and cause further levels of recovery at said host system. Disconnected operations are then terminated by sending unit check status with appropriate sense after the internal recovery is complete.

A preferred algorithm avoids these interface errors by gracefully terminating the connected operations with unit check status prior to performing the interface recovery. The problems which must be solved to do this are:
1) Internal Recovery must be performed quickly. Handshaking with the host to terminate the operations may take too long.
2) The host will attempt to perform more operations. The host must be signalled to cease starting new I/O operations while the internal controller recovery is taking place.
3) There are periods of time in an I/O operation where the controller cannot send status to terminate the operation. The operations which can be gracefully terminated must be identified.
4) Host system architectures (such as System 390) and ERPs (Error Recover Procedures) require appropriate sense after unit check status is sent. The controller must be able to provide appropriate sense for selected operations that have been terminated.

In current implementations host adapters in the controller typically perform the following steps during recovery:
1) Operations on the host interface are abnormally aborted.
2) Structures are reinitialized.
3) The normal work scan loop is entered
4) Communication is reestablished with the controller's main processor.

The new recovery must be done before step #1 where the operations are abnormally terminated. Handshakes with the host system to complete the unit check status presentation require a scan loop to dispatch functions to process the responses and new work requests from the host. The normal scan loop cannot be used during this recovery until after step 2 is performed. A new limited scanloop is entered prior to step #1 which performs the following:

A )The operations which can be terminated with unit check status are identified by the host adapter. These are the operations which the controller's main processor is aware of. This requirement is so the main processor can build sense for those operations after the recovery is complete. The operations also must be at a point where there is an ou...