Browse Prior Art Database

Fault Tolerant Architecture for Communication Adapters and Systems

IP.com Disclosure Number: IPCOM000110599D
Original Publication Date: 1992-Dec-01
Included in the Prior Art Database: 2005-Mar-25
Document File: 4 page(s) / 172K

Publishing Venue

IBM

Related People

Serpanos, DN: AUTHOR

Abstract

Fault-tolerant communication systems are important in high-speed networks. Although many protocols can compensate for failing devices by establishing alternate routes for packets, the delay and the overhead associated with this recovery can be significant, leading to a high loss of useful bandwidth. A fault-tolerant device can avoid this bandwidth waste by a fast response to the failure.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 45% of the total text.

Fault Tolerant Architecture for Communication Adapters and Systems

       Fault-tolerant communication systems are important in
high-speed networks.  Although many protocols can compensate for
failing devices by establishing alternate routes for packets, the
delay and the overhead associated with this recovery can be
significant, leading to a high loss of useful bandwidth.  A
fault-tolerant device can avoid this bandwidth waste by a fast
response to the failure.

      In this invention a fault tolerant communication system
architecture is disclosed for systems that employ the same adapter
organization as the Generic High Bandwidth Adapter.  The architecture
uses fault-tolerant adapters with multiple network interface modules
to overcome failures at the interface level and with software memory
management support to overcome failures of the hardware memory
manager.  Redundant adapters are used to overcome permanent failures
of an adapter's processor subsystem.

      The disclosed architecture achieves high system availability by
using redundant hardware resources and software support to compensate
for failures at various adapter modules: network interface, memory
manager, and processing subsystem.  Since the architecture applies to
systems using the adapter design of the Generic High Bandwidth
Adapter (GHBA) architecture, we use the GHBA terminology.  The GHBA
architecture and the terms used are shown in Fig. 1.  The disclosed
architecture uses:
1.  Multiple PMIs per adapter to overcome PMI failures;
2.  Backup adapters to overcome processor failures on adapters;
3.  Software support to overcome failures of the memory management
module (i.e., the GAM).

      A typical bridge/router is organized as a set of adapters that
attach to networks.  The adapters communicate through the adapter
interconnection, which can range from simple or hierarchical buses to
switches.  There are two levels where fault tolerance can be
incorporated in the design: the adapter level and the interconnection
level.

      At the adapter level, the use of multiple PMIs allows the
adapter to overcome a PMI failure.  A configuration with two PMIs
used to attach to a common network link is shown in Fig. 2.  The
system can use the PMIs either in parallel, or in standby mode.  When
the two PMIs are used in parallel, the incoming (outgoing) packets
are divided in two streams targeting (originating) from the memory.
This can result in improved PMI flushing rate and can lead to a
significant performance increase, especially for short packets.  The
use of parallel PMIs requires a solution to the sequencing problem
[1] (sequencing is not a universal requirement for internetworking
devices, but it should be addressed in the context of a system that
is targeted to attaching to a multiplicity of networks).  This
configuration provides multiple significant advantages: high
performance, load balancing, and graceful performance degradation in
the...