Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Tape Subsystem Dynamic Error Recovery Trace, Summary and History Buffers

IP.com Disclosure Number: IPCOM000109331D
Original Publication Date: 1992-Aug-01
Included in the Prior Art Database: 2005-Mar-23
Document File: 4 page(s) / 214K

Publishing Venue

IBM

Related People

Nylander-Hill, P: AUTHOR

Abstract

This article emphasizes a strategy based on problem capture rather than problem recreation to support more efficient field and engineering development problem resolution. This strategy is essential due to the potentially complex error recovery and multiple concurrent error handling inherent in a tape subsystem. The need to recreate a problem with instrumentation is significantly reduced or eliminated with the utilization of dynamic traces and histories buffered internal to the subsystem. The dynamic error recovery and trace facilities described in this article provide a means for easily determining the error recovery actions and paths actually taken in production operation of the machine, thereby facilitating rapid resolution of field problems with less customer involvement.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 35% of the total text.

Tape Subsystem Dynamic Error Recovery Trace, Summary and History Buffers

       This article emphasizes a strategy based on problem
capture rather than problem recreation to support more efficient
field and engineering development problem resolution.  This strategy
is essential due to the potentially complex error recovery and
multiple concurrent error handling inherent in a tape subsystem.  The
need to recreate a problem with instrumentation is significantly
reduced or eliminated with the utilization of dynamic traces and
histories buffered internal to the subsystem.  The dynamic error
recovery and trace facilities described in this article provide a
means for easily determining the error recovery actions and paths
actually taken in production operation of the machine, thereby
facilitating rapid resolution of field problems with less customer
involvement.  The facilities have been provided for device error
recovery as distinct from channel error recovery.

      A large (3 Kword) wrapping, lockable buffer has been provided
for error recovery procedure (ERP) trace information.  A smaller
buffer (468 words) is provided for ERP History information.  The
trace facility is automatically activated at IML time and can be made
to select only specific events of interest via the maintenance device
(MD) or microcode patch.  ERP summary information is inserted into
sense data returned to the host.  Conversely, sense data is also
posted to the error recovery trace buffer.  This dual posting to two
different trace buffers provides synchronization of internal and
external error information.  Transmitting ERP summary information to
the host provides substantially greater information on the reported
error.

      Primary methods for problem determination and resolution for a
tape subsystem operating in a customer environment are as follows:
*    Subsystem Sense Bytes passed to Host at time of error (single
instance)
*    Aggregate of Sense Byte data
*    Control Unit Dump containing a snapshot of microcode control
store, data tables, and hardware registers
*    Online/Offline diagnostic tests

      With the exception of some buffered tables in the Control Unit
Dump, these are all static references which represent a snapshot of
the subsystem.  They do not reflect a dynamic relationship.

      A dynamic trace function was added to the Error Recovery
Procedure (ERP) microcode of the 3490E Tape Subsystem.  It has 3
features which track which paths were executed, what blocks on tape
were involved, and what key data elements drove the decisions made in
the process of recovering from an error.
1.   ERP Trace Buffer
      The ERP Trace Buffer is intended to hold a wide variety of
Control Transfer information, messages, and data snapshots on the
basis of device address.  Trace records are of variable length and
content.  They can be used to recontruct subsystem recovery and to
track compound error recovery path...