Browse Prior Art Database

Jury: Enhancing Fault Tolerance of Transaction Management

IP.com Disclosure Number: IPCOM000035694D
Original Publication Date: 1989-Aug-01
Included in the Prior Art Database: 2005-Jan-28
Document File: 3 page(s) / 18K

Publishing Venue

IBM

Related People

Dolev, D: AUTHOR [+3]

Abstract

A method is described for increasing the fault tolerance of standard two-phase distributed commit protocols so that small numbers of simple component failures cannot cause blocking or require locking of otherwise accessible data for indefinitely long periods of time. However, provided some method is used to authenticate communication, the method will block rather than permit inconsistent results when larger numbers of failures are encountered. The method can also be applied in a straightforward way to other distributed coordination problems such as atomic broadcast and consensus.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 31% of the total text.

Page 1 of 3

Jury: Enhancing Fault Tolerance of Transaction Management

A method is described for increasing the fault tolerance of standard two- phase distributed commit protocols so that small numbers of simple component failures cannot cause blocking or require locking of otherwise accessible data for indefinitely long periods of time. However, provided some method is used to authenticate communication, the method will block rather than permit inconsistent results when larger numbers of failures are encountered. The method can also be applied in a straightforward way to other distributed coordination problems such as atomic broadcast and consensus.

There are two fundamental ideas that constitute this invention. One is the addition of a set of new processes called jurors to the transaction process. The other is the combination of timeout with unilateral requests for more time that are propagated throughout the communication structure associated with a given transaction.

The set of jurors is used to replicate the commit decision role usually reserved for a single transaction coordinator process. The timeout/time request combination is used to detect the inability of a process to communicate while not requiring any preknowledge of the time the transaction will actually require to complete processing. Together they can be used to prevent blocking by a small number of faults.

Following is a description of the assumed environment to which the method is applied. There is a transaction processing system in which transactions are initiated at any site and at any time. Each transaction dynamically invokes a set of processes at various sites to do work on its behalf. The set of processes may depend on data at the sites. There is a dynamically growing communication structure (typically, a tree) that allows all processes doing work on behalf of the transaction to communicate with each other, provided there are no faults in the underlying communication media. Each process has a timer and access to a clock. The clocks are approximately synchronized to within some known precision. Each process knows when it has completed the work on behalf of a given transaction. Any process can unilaterally decide that the work must be aborted (for any reason) until it enters a "prepared" state. Once a process enters the "prepared" state, a process participating in this base transaction processing must wait for instructions from a coordinator on whether to commit or abort the work.

This environment is modified in the method of this invention as follows. When a transaction is initiated, a small set of processes called jurors is chosen and their names are conveyed to each process invoked to do work on the transaction. (The juror processes are invoked before any nonjuror processes are invoked.) As each juror process is invoked, it sets a timer. The initiation time of the transaction is conveyed to all processes that work on it. Each juror sets its timer for a known constant t...