Browse Prior Art Database

BUS Fault Identification Algorithm

IP.com Disclosure Number: IPCOM000036892D
Original Publication Date: 1989-Nov-01
Included in the Prior Art Database: 2005-Jan-29

Publishing Venue

IBM

Related People

Berglund, NC: AUTHOR [+2]

Abstract

This algorithm uses hardware status collected at the time of a system bus failure to identify the cause of intermittent and stuck failures, independent of the number and location of I/O bus units (IOBU) on 1 to 8 boards. The algorithm is the basis for subsequent actions to provide uninterrupted bus operations by either recovering the failing operation and recording statistics for error thresholding or disabling the failing IOBU pending its repair. The problem of identifying the cause of bus operation failures is complicated by the wide range of configurations. The communicating bus units may be on the same board as the processor, separated from the processor by multiple boards, or may be separated from each other by multiple boards.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 20% of the total text.

Page 1 of 10

BUS Fault Identification Algorithm

This algorithm uses hardware status collected at the time of a system bus failure to identify the cause of intermittent and stuck failures, independent of the number and location of I/O bus units (IOBU) on 1 to 8 boards. The algorithm is the basis for subsequent actions to provide uninterrupted bus operations by either recovering the failing operation and recording statistics for error thresholding or disabling the failing IOBU pending its repair. The problem of identifying the cause of bus operation failures is complicated by the wide range of configurations. The communicating bus units may be on the same board as the processor, separated from the processor by multiple boards, or may be separated from each other by multiple boards. Since the boards

(Image Omitted)

are in physically different units, it is essential to clearly identify the location of failures, particularly intermittent failures, which cause system crashes and which cannot be recreated by diagnostics. The algorithm identifies the cause of failures independent of the system bus configuration.

A system I/O bus consists of a 32-bit data bus, an 8-bit command/ status bus, a 5-bit origin/destination bus, and control and arbitration signals. The bus operates asynchronously and uses priority serial arbitration. This permits the bus to accommodate different performance controllers and be serially extended to 7 additional boards. A special type of IOBU, a Bus Extension Unit (BEU), transfers the bus from one board to another. The extension boards are allowed to be in separate power domains to provide the opportunity for concurrent repair of I/O controllers in remote boards.

(Image Omitted)

Each bus contains up to 32 IOBUs. One IOBU, typically the processor, is designated as Bus Control Unit (BCU). While the arbitration function is distributed across the IOBUs, the BCU provides a master control over arbitration for error recovery and initial program load functions.

The primitive protocol for communication consists of fixed length messages and variable length packet direct memory access operations. Bus operations fail when the master or slave provides information with incorrect parity, when the master or slave incorrectly receives information from the bus, and when the information is corrupted while passing through a BEU. The operation may also fail if the interlocked tag signal sequence is suspended due to a failure in the master, slave, or BCU or in the failure of a BEU to properly propagate the signal. The master and slave are responsible to check parity on all the information transferred between them. The slave, for example, checks the parity on the command, address, and data received from the master. The master checks parity on the data and status received from the slave. Whenever a parity check is detected, the detecting IOBU is responsible to intentionally suspend the interlocked handshake sequence, thereby preventing completion of th...