Browse Prior Art Database

Fault isolation and recovery (RFC0816) Disclosure Number: IPCOM000003865D
Original Publication Date: 1982-Jul-01
Included in the Prior Art Database: 2000-Sep-13
Document File: 11 page(s) / 20K

Publishing Venue

Internet Society Requests For Comment (RFCs)

Related People

D.D. Clark: AUTHOR


Occasionally, a network or a gateway will go down, and the sequence

This text was extracted from a ASCII document.
This is the abbreviated version, containing approximately 14% of the total text.

RFC: 816


David D. Clark

MIT Laboratory for Computer Science

Computer Systems and Communications Group

July, 1982

1. Introduction

Occasionally, a network or a gateway will go down, and the sequence

of hops which the packet takes from source to destination must change.

Fault isolation is that action which hosts and gateways collectively

take to determine that something is wrong; fault recovery is the

identification and selection of an alternative route which will serve to

reconnect the source to the destination. In fact, the gateways perform

most of the functions of fault isolation and recovery. There are,

however, a few actions which hosts must take if they wish to provide a

reasonable level of service. This document describes the portion of

fault isolation and recovery which is the responsibility of the host.

2. What Gateways Do

Gateways collectively implement an algorithm which identifies the

best route between all pairs of networks. They do this by exchanging

packets which contain each gateway's latest opinion about the

operational status of its neighbor networks and gateways. Assuming that

this algorithm is operating properly, one can expect the gateways to go

through a period of confusion immediately after some network or gateway


has failed, but one can assume that once a period of negotiation has

passed, the gateways are equipped with a consistent and correct model of

the connectivity of the internet. At present this period of negotiation

may actually take several minutes, and many TCP implementations time out

within that period, but it is a design goal of the eventual algorithm

that the gateway should be able to reconstruct the topology quickly

enough that a TCP connection should be able to survive a failure of the


3. Host Algorithm for Fault Recovery

Since the gateways always attempt to have a consistent and correct

model of the internetwork topology, the host strategy for fault recovery

is very simple. Whenever the host feels that something is wrong, it

asks the gateway for advice, and, assuming the advice is forthcoming, it

believes the advice completely. The advice will be wrong only during

the transient period of negotiation, which immediately follows an

outage, but will otherwise be reliably correct.

In fact, it is never necessary for a host to explicitly ask a

gateway for advice, because the gateway will provide it as appropriate.