Browse Prior Art Database

Two-Phase Commit Resynchronization

IP.com Disclosure Number: IPCOM000117177D
Original Publication Date: 1996-Jan-01
Included in the Prior Art Database: 2005-Mar-31
Document File: 2 page(s) / 104K

Publishing Venue

IBM

Related People

Banks, TW: AUTHOR [+2]

Abstract

In order to keep communication systems in synchronization, message exchange protocol have been developed, one of which, the Two-Phase Commit protocol (2PC), exhibits the following problems: 1. Existing techniques have inadequate mechanisms for the identification of the partner system during the resynchronization phase of the 2PC protocol; the result can be invalid completion of the protocol. 2. The practical operation of a complex of systems connected using a (2PC) protocol demands that occasionally one or more of the systems is reset to an initial (cold started) state. This procedure is needed to install program maintenance or an upgrade, or as a last resort in an emergency.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 51% of the total text.

Two-Phase Commit Resynchronization

      In order to keep communication systems in synchronization,
message exchange protocol have been developed, one of which, the
Two-Phase Commit protocol (2PC), exhibits the following problems:
  1.  Existing techniques have inadequate mechanisms for the
       identification of the partner system during the
resynchronization
       phase of the 2PC protocol; the result can be invalid
completion
       of the protocol.
  2.  The practical operation of a complex of systems connected using
       a (2PC) protocol demands that occasionally one or more of the
       systems is reset to an initial (cold started) state.  This
       procedure is needed to install program maintenance or an
upgrade,
       or as a last resort in an emergency.  Partner systems which
have
       been in communication with the cold-started system may have
       information dependent on the checkpoint data which have been
lost
       as a result of the cold start; existing techniques demand an
       uncomfortable decision between:
      o  Keeping data which describes how to maintain data integrity
          in the system.
      o  Loss of connectivity (i.e., loss of service) until all such
          data is investigated and removed.

      The problem (1) arises from the premise that each system can
identify a unique instance of a partner system with which it is
communicating.  Since there are multiple parallel communication
channels between the systems this premise is false; some channels may
be receiving messages from an old in stance of the partner, which has
in the meantime been cold-started.  The messages could have been
delayed in transmission.

      Enhancing the mechanisms to cope with multiple
partner-instances cures this problem and makes the systems more
manageable than in (2) by allowing the execution of new work in
parallel with the investigation of data affected by a cold-started
partner system.

      During the exchange of messages which comprises the protocol,
information is stored on non-volatile media (the log) so that the
state of the system can be reconstructed in case of failure (except,
of course, a failure of the log).  Sometimes during the protocol both
systems have a record of the execution, and store correlation tokens
(unit of work identifiers, and conversation correlators) to match up
the data later, but at the beginning or the end of the protocol only
one of the systems has a record.

      If communications fail in such circumstances the system which
restarts to find remnants of the protocol on its log must start
re-execution of the protocol (this re-execution is known as
resynchronization).

      To preserve data integrity, the system receiving a
resynchronization message must not reply 'no matching data found'
simply on the basis that it has no record of the correlators...