Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Fast Clock Synchronization in a System Resilient to a Wide Class of Failures

IP.com Disclosure Number: IPCOM000105734D
Original Publication Date: 1993-Sep-01
Included in the Prior Art Database: 2005-Mar-20
Document File: 2 page(s) / 78K

Publishing Venue

IBM

Related People

Garay, JA: AUTHOR [+2]

Abstract

A common technique for constructing computer systems that are able to tolerate faults is to replicate the computation on several processors. In order to apply this technique, two tools are necessary: a fault-tolerant Broadcast method for broadcasting data to all processors (such data may originate either from one of the processors or as the output of a sensor) and a fault-tolerant Consensus method allowing the processors to reach consensus on the computation's result, so that they may take a common action (e.g., activating an actuator). In prior work, methods were disclosed for Broadcast and Consensus, as well as a means for combining the methods to implement a fault-tolerant computing system.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Fast Clock Synchronization in a System Resilient to a Wide Class of Failures

      A common technique for constructing computer systems that are
able to tolerate faults is to replicate the computation on several
processors.  In order to apply this technique, two tools are
necessary:  a fault-tolerant Broadcast method for broadcasting data
to all processors (such data may originate either from one of the
processors or as the output of a sensor) and a fault-tolerant
Consensus method allowing the processors to reach consensus on the
computation's result, so that they may take a common action (e.g.,
activating an actuator).  In prior work, methods were disclosed for
Broadcast and Consensus, as well as a means for combining the methods
to implement a fault-tolerant computing system.  The salient feature
of these methods was the ability of the n processors to overcome up
to t processor failures where processors may fail either by crashing
or acting arbitrarily, subject to no more than b processors acting
arbitrarily.  These methods worked best when the hardware clocks of
all processors were perfectly synchronized.

      Hardware clocks can drift, e.g., they advance approximately 1
second after one second of true time.  For example, a clock may have
a drift rate of plus or minus one percent, meaning that after one
second of true time, a fast clock can advance 1.01 seconds whereas a
slow clock may only advance .99 seconds.  Thus, after one second of
true time, clocks that are initially perfectly synchronized may
differ in their readings by as much as .02.  The difference in
readings gets worse over time:  after an additional second of true
time, the fast clock can read 2.02 while the slow clock reads 1.98
After enough real time has passed, the readings may differ by so much
as to render the clocks useless.  Therefore, a software method is
necessary to re-synchronize the clocks.  Prior work disclosed a
method tolerant of up to t processor failures, of which no more than
b processors (and their clocks) can be arbitrary.  The method,
however, requires each processor to send messages to all other
processors 3 times per synchronization instance.  Disclosed is a
method for the same problem that requires only 2 message exchanges,
at the...