Browse Prior Art Database

A Mechanism to Ensure Exactly-Once with a Non-Deterministic Replicated Client Invoking a Replicated Server

IP.com Disclosure Number: IPCOM000016191D
Original Publication Date: 2002-Aug-03
Included in the Prior Art Database: 2003-Jun-21
Document File: 5 page(s) / 85K

Publishing Venue

IBM

Abstract

Modern applications are generally composed of several components. These components may run in different processes, potentially on different machines. Assume, for instance, that client C sends a request to server component S, which in turn sends a request to another server component T in order to fullfill C's request (S is taking the role of a "client" from T's perspective). Component S then receives the result from T, performs its computations, and returns its result to client C. Any crash failure of component S or T leads to blocking in the client request (reduced availability). Replication allows to achieve fault tolerance and thus prevents blocking. More specifically, the service of S (and T) is provided by several replicas S_0, S_1, and S_2 (T_0, T_1, and T_2). Figure 1 illustrates this scenario. Although a replica may fail (e.g., S_0), other replicas can take over the execution of the client request (S_1 and S_2).

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 37% of the total text.

Page 1 of 5

  A Mechanism to Ensure Exactly-Once with a Non-Deterministic Replicated Client Invoking a Replicated Server

  Modern applications are generally composed of several components. These components may run in different processes, potentially on different machines. Assume, for instance, that client C sends a request to server component S, which in turn sends a request to another server component T in order to fullfill C's request (S is taking the role of a "client" from T's perspective). Component S then receives the result from T, performs its computations, and returns its result to client C. Any crash failure of component S or T leads to blocking in the client request (reduced availability). Replication allows to achieve fault tolerance and thus prevents blocking. More specifically, the service of S (and T) is provided by several replicas S_0, S_1, and S_2 (T_0, T_1, and T_2). Figure 1 illustrates this scenario. Although a replica may fail (e.g., S_0), other replicas can take over the execution of the client request (S_1 and S_2).

In an asynchronous system (i.e., no bounds on transmission delays and relative process speeds exist), multiple replicas executing the client request may lead to multiple requests send to T and thus multiple executions on T. Indeed, assume that S_0 executes the request of C and sends request inv_0 to T, but then crashes. When S_1 detects the failure, it takes over the execution and also sends a request, say inv_1, to T. Hence, T receives two requests from S, namely inv_0 and inv_1. If T can detect that

1

[This page contains 4 pictures or other non-text objects]

Page 2 of 5

inv_0 and inv_1 emanate from the same logical component S (but differnt replicas), it can simply ignore the duplicate request inv_1. However, this is only possible if inv_0 and inv_1 use the same ID, which, in turn, generally requires that S_0 and S_1 execute deterministically. Non-determinism occurs if the components use multi-threading, non-deterministic system call such as local time or randomized numbers, or asynchronous events (e.g., software interrrupts, system exceptions). Non-deterministic component S may lead to different IDs for inv_0 and inv_1, which are then considered to be requests from different clients by T. Worse, inv_0 and inv_1 may even be sent to different servers. Clearly, this leads to multiple executions and thus to an incorrect system state.

Currently, this problem is solved by enforcing determinism in the execution of the replicas S_i. This generally requires to use single-threaded applications or implement deterministic thread scheduling and not to use any other sources of non-determinism. This is a limitation for general applications. In particular, it prevents that the passive replication approach is used to its full advantage. Indeed, passive replication (in contrary to active replication) allows non-deterministic execution. More specifically, with passive replication (also called primary-backup) only one replica,...