Browse Prior Art Database

Data Driven Error Recovery Architecture for a Magnetic Tape Subsystem

IP.com Disclosure Number: IPCOM000109395D
Original Publication Date: 1992-Aug-01
Included in the Prior Art Database: 2005-Mar-24
Document File: 7 page(s) / 360K

Publishing Venue

IBM

Related People

Kiser, JM: AUTHOR [+2]

Abstract

This article describes a structured, maintainable mechanism for defining and controlling multiple, concurrent error handling on a magnetic tape subsystem. The nature of the design and implementation provide for a great deal on granularity in determining the correct means of handling concurrent errors. The Error Transition Data Grid approach (and its automated creation) assures that the specification will be complete and that proper termination will always occur. It makes understanding and maintenance of a complex component of error recovery straightforward.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 22% of the total text.

Data Driven Error Recovery Architecture for a Magnetic Tape Subsystem

       This article describes a structured, maintainable
mechanism for defining and controlling multiple, concurrent error
handling on a magnetic tape subsystem.  The nature of the design and
implementation provide for a great deal on granularity in determining
the correct means of handling concurrent errors.  The Error
Transition Data Grid approach (and its automated creation) assures
that the specification will be complete and that proper termination
will always occur.  It makes understanding and maintenance of a
complex component of error recovery straightforward.

      The Error Transition Data Grid has several advantages:
*    It offers fine resolution on the full spectrum of errors with
little associated microcode cost.
*    Control flow can be directed based on the relationship of one
error code to all of its predecessors as opposed to simply the last
error code presented.  This is achieved by traversing the data grid
with each transition guided by the last.
*    Control flow changes can be made by altering a fixed-length data
area instead of modifying lines of code.
*    Compound error handling is isolated as opposed to a distributed
control process.

      Since a magnetic tape subsystem is not operating in a
controlled environment, errors can be compounded in the course of
error recovery.  Hardware faults (permanent or intermittent), media
defects (creases, debris), adherences to the recording head, and
microcode logic errors can all combine to create an error scenario
with a set of symptoms.  These symptoms are detected by hardware
and/or microcode and internal recovery is attempted before the error
is reported to the host system.  The assumption is that each detected
error is reported to the host system.  The inherent difficulty in
controlling recovery is the often-made assumption that each detected
error is independent of all others.  In actuality, errors can be
rated as primary to the recovery task or secondary.  Secondary errors
are random in the sense that they have no relationship to the initial
error or to the sequence of error recovery actions taken.  As such,
they should be treated tangentially.  The difficulty, however, comes
in the dynamic assessment of errors as they occur.  Without the
capability to distinguish errors and to react accordingly, a change
of error is often the cause for the termination of internal error
recovery and the presentation of permanent error status.

      The data grid approach taken recognizes that to code, verify,
and maintain a complex structure of conditional rules and
relationships covering all permutations of errors poses a massive
task.  Instead, multiple error handling is initiated and controlled
by a Supervisor using an Error Transition Data Grid.  This data grid
has as both axes all possible error recovery codes (ERP Codes) that
can be presented for internal error recove...