Browse Prior Art Database

Communication Support for Reliable Distributed Computing Disclosure Number: IPCOM000148373D
Original Publication Date: 1986-May-31
Included in the Prior Art Database: 2007-Mar-29
Document File: 18 page(s) / 1M

Publishing Venue

Software Patent Institute

Related People

Birman, Kenneth P.: AUTHOR [+3]


Communication Support for Reliable Distributed Computing*

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 10% of the total text.

Page 1 of 18

Communication Support for Reliable

Distributed Computing*

Kenneth P.


Thomas A. Joseph

TR 86-753

May 1986

Department of Computer Science Cornell University
Ithaca, NY 14853

. * This work was supported by the Defense Advanced Research Projects Agency (DoD) under

ARPA order 5378, Contract MDA903-85-C-0124, .and by the National Science Foundation under grant DCR-8412582.

The views, opinions and findings contained in this report are those

. of the authors and should not be construed as an official Department of Defense position, policy, or decision.

[This page contains 1 picture or other non-text object]

Page 2 of 18

[This page contains 1 picture or other non-text object]

Page 3 of 18


DIS-UTEH) C O r n r n G

Kenneth P. Birmaa cnd hornas A. Jasepb

 Dcpm?men? of Contputcr Science Cornell University, Ithaca, New York

   We describe a collection of axmnunication primitives integrated with a mecbnb for han- dling process failure and recovery. These prknilk~es facilitate the ~ 1 ~ t a t i o n
d fault-tolerant

process groups, which can be used to provide distributed &QS in an envhmmt subject to non-dcious crash failures.

1. Introdudion

   At Cornell, we recently completed a protow of the ZSIS system, which transform ebstract type specif~cations into fault-tolerant distributed implfmcntations, VIUC
insulating users frcm the

mechanisms by which fault-tolerance is achieved m a ] .

                                 A wide range of reliable compdca- tim primitives have been proposed in the literature, tmd we became comrinad that by uskg such

primitives when building the ISIS system, complexity could be avoided. klnforbmately, the exist- ing protocols, which range from reliable and atomic broadcast [-] [Cistian] [SrQleider] to Byzantine agreement [Strong], either do not satisfy the ordering constraints required for many fault-tolerant applications or satisfy a stronger constraint than necessary at too high a cost. h par- ticular, these jxotocols have not attempted to minimbe the latency (delay) b e d
before mes-

sage delivery can ocw.

. In ISIS, latency appears to bt a major factor that limits performance.

Fault-tolerant distributed systems also need a way to det- failures and recaveria consistently, and we found that this could be integrated into the communication layer in a manner tbat reduces the synchronization burden on higher level algorithms. These obations motivated the ckvdop

rnent of a new collection of primitives, which we present bdow.

   *This work. was supparted by rhe Deftnse Advanced Research Projects Agency CT)aD) under ARPA order 5378, Gnuact MDA903-85-C-0124, and by thc National Science Fcundatica under grant DCR-8412582. The views, cpinian and findings caKained in this repart are those of the auttuPs and shrxld na bc emstrued as an official Dqammt of

Defense po6idm. pdicy, ar decision.

[This page contains 1 picture...