Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Software checksumming in the IMP and network reliability (RFC0528)

IP.com Disclosure Number: IPCOM000005908D
Original Publication Date: 1973-Jun-20
Included in the Prior Art Database: 2001-Nov-15
Document File: 10 page(s) / 23K

Publishing Venue

Internet Society Requests For Comment (RFCs)

Related People

J.M. McQuillan: AUTHOR

Abstract

As the ARPA Network has developed over the last few years, and our experience with operating the IMP subnetwork has grown, the issue of reliability has assumed greater importance and greater complexity. This note describes some modifications that have recently been made to the IMP and TIP programs in this regard. These changes are mechanically minor and do not affect Host operation at all, but they are logically noteworthy, and for this reason we have explained the workings of the new IMP and TIP programs in some detail. Host personnel are advised to note particularly the modifications described in sections 4 and 5, as they may wish to change their own programs or operating procedures.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 15% of the total text.

Network Working Group                                      J.  McQuillan

Request for Comments: 528                                        BBN-NET

NIC: 17164                                                  20 June 1973

        SOFTWARE CHECKSUMMING IN THE IMP AND NETWORK RELIABILITY

   As the ARPA Network has developed over the last few years, and our

   experience with operating the IMP subnetwork has grown, the issue of

   reliability has assumed greater importance and greater complexity.

   This note describes some modifications that have recently been made

   to the IMP and TIP programs in this regard.  These changes are

   mechanically minor and do not affect Host operation at all, but they

   are logically noteworthy, and for this reason we have explained the

   workings of the new IMP and TIP programs in some detail.  Host

   personnel are advised to note particularly the modifications

   described in sections 4 and 5, as they may wish to change their own

   programs or operating procedures.

1. A Changing View of Network Reliability

   Our idea of the Network has evolved as the Network itself has grown.

   Initially, it was thought that the only components in the network

   design that were prone to errors were the communications circuits,

   and the modem interfaces in the IMPs are equipped with a CRC checksum

   to detect "almost all" such errors.  The rest of the system,

   including Host interfaces, IMP processors, memories, and interfaces,

   were all considered to be error-free.  We have had to re-evaluate

   this position in the light of our experience.  In operating the

   network we are faced with the problem of having to perform remote

   diagnosis on failures which cannot easily be classified or

   understood.  Some examples of such problems include reports from Host

   personnel of lost RFNMs and lost Host-Host protocol allocate

   messages, inexplicable behavior in the IMP of a transient nature,

   and, finally, the problem of crashes -- the total failure of an IMP,

   perhaps affecting adjacent IMPs.  These circumstances are infrequent

   and are therefore difficult to correlate with other failures or with

   particular attempted remedies.  Indeed, it is often impossible to

   distinguish a software failure from a hardware failure.

   In attempting to post-mortem crashes, we have sometimes found the IMP

   program has had instructions incorrect--sometimes just one or two

   bits picked or dropped.  Clearly, memory errors can account for

   almost any failure, not only program crashes but also data errors

   which can lead to many other syndromes.  For instance, if the address

   of a message is changed in transit, then one Host thinks the message

   was lost, and another Host may receive an extra message.  Errors of

   this kind fall into two general classes: errors in Host messages,

McQuillan                                                       [Page 1]

RFC 528             SOFTWARE CHECKSUMMING IN THE IMP        20 June 1973

   whether in the control information or the data, and errors in inter-

   IMP messages, primarily routing update messages.  In the course of

   the last few years, it has become increasingly clear that such errors

   were occurring, though it was difficult to speculate as to where,

   why, and how often.

   One of the earliest problems of ...