Browse Prior Art Database

Early Stopping Protocol Resilient to a Wide Class of Failures

IP.com Disclosure Number: IPCOM000105028D
Original Publication Date: 1993-Jun-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 4 page(s) / 66K

Publishing Venue

IBM

Related People

Garay, JA: AUTHOR [+2]

Abstract

A common technique for constructing computer systems that are able to tolerate faults is to replicate the computation on several processors. In order to apply this technique,two tools are necessary: a fault-tolerant Broadcast method for broadcasting data to all processors (such data may originate either from one of the processorsor as the output of a sensor) and a fault- tolerant Consensus method allowing the processors to reach consensus on the computation's result,so that they may take a common action (e.g., activating an actuator). These two methods have been studied in the abstract and are known respectively as the Byzantine Generals Problem and the Consensus Problem.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 83% of the total text.

Early Stopping Protocol Resilient to a Wide Class of Failures

      A common technique for constructing computer systems that are
able to tolerate faults is to replicate the computation on several
processors.  In order to apply this technique,two tools are
necessary: a fault-tolerant Broadcast method for broadcasting data to
all processors (such data may originate either from one of the
processorsor as the output of a sensor) and a fault- tolerant
Consensus method allowing the processors to reach consensus on the
computation's result,so that they may take a common action (e.g.,
activating an actuator).  These two methods have been studied in the
abstract and are known respectively as the Byzantine Generals Problem
and the Consensus Problem.

      Previously there has been disclosed methods for Broadcast and
Consensus, as well as a means for combining the methods to implement
a fault-tolerant computing system.  The salient feature of these
methods was the ability of the n processors to overcome up to t
processor failures where processors may fail either by crashing or
acting arbitrarily, subject to no more than b processors acting
arbitrarily.  The running time of these protocols was 3 times (t +
1).  In the sequel let f denote the actual number (versus t, the
upperbound) of processor failures during an execution of the
protocol.  We now disclose protocols for the same problem whose
running time is 5 times (f+ 1) when f <b and 5 times(t + 1)
otherwise.  Thus,...