Browse Prior Art Database

Failure Handling and Serviceability in a Data Transmission System

IP.com Disclosure Number: IPCOM000035759D
Original Publication Date: 1989-Aug-01
Included in the Prior Art Database: 2005-Jan-28
Document File: 2 page(s) / 91K

Publishing Venue

IBM

Related People

Aldebert, JP: AUTHOR [+4]

Abstract

Disclosed is an on-line detection and fault isolation mechanism for a data transmission system (storage controller of any computer). It improves system availability by sorting errors according to their origin, priority, and weight, and by offering a degraded mode, where service is maintained in part of the transmission system (disabling and isolating the failing part).

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 89% of the total text.

Page 1 of 2

Failure Handling and Serviceability in a Data Transmission System

Disclosed is an on-line detection and fault isolation mechanism for a data transmission system (storage controller of any computer). It improves system availability by sorting errors according to their origin, priority, and weight, and by offering a degraded mode, where service is maintained in part of the transmission system (disabling and isolating the failing part).

In case of errors, it performs automatic failing "Field Replacable Unit" isolation, thus saving maintenance cost as well as decreasing maintenance time.

This method can be generalized to any subsystem where high availability is a requirement.

The error detection system detects a failure as close as possible to its origin in order to prevent error propagation.

The mechanism uses hardware checkers (such as parity generators/ checkers, time-out counters, errors priority encoders, etc.) carefully designed at all boundaries; not only physical boundaries, such as card I/Os, but also logical boundaries, such as interfaces between different functional islands. The subject mechanism is shown in the figure, as implemented in the storage control (SC), of a system comprising a memory (MS), a central control unit (CCU) and adapters E, F, G.

The storage control comprises three functional islands: a main store (MS) controller 10, a direct memory access (DMA) controller 20 and a CCU controller
30.

The table below shows the main failures handling. Fo...