A Mechanism to Ensure Exactly-Once with a Non-Deterministic Replicated Client Invoking a Replicated Server
Original Publication Date: 2002-Aug-03
Included in the Prior Art Database: 2003-Jun-21
Modern applications are generally composed of several components. These components may run in different processes, potentially on different machines. Assume, for instance, that client C sends a request to server component S, which in turn sends a request to another server component T in order to fullfill C's request (S is taking the role of a "client" from T's perspective). Component S then receives the result from T, performs its computations, and returns its result to client C. Any crash failure of component S or T leads to blocking in the client request (reduced availability). Replication allows to achieve fault tolerance and thus prevents blocking. More specifically, the service of S (and T) is provided by several replicas S_0, S_1, and S_2 (T_0, T_1, and T_2). Figure 1 illustrates this scenario. Although a replica may fail (e.g., S_0), other replicas can take over the execution of the client request (S_1 and S_2). In an asynchronous system (i.e., no bounds on transmission delays and relative process speeds exist), multiple replicas executing the client request may lead to multiple requests send to T and thus multiple executions on T. Indeed, assume that S_0 executes the request of C and sends request inv_0 to T, but then crashes. When S_1 detects the failure, it takes over the execution and also sends a request, say inv_1, to T. Hence, T receives two requests from S, namely inv_0 and inv_1. If T can detect that 1 inv_0 and inv_1 emanate from the same logical component S (but differnt replicas), it can simply ignore the duplicate request inv_1. However, this is only possible if inv_0 and inv_1 use the same ID, which, in turn, generally requires that S_0 and S_1 execute deterministically. Non-determinism occurs if the components use multi-threading, non-deterministic system call such as local time or randomized numbers, or asynchronous events (e.g., software interrrupts, system exceptions). Non-deterministic component S may lead to different IDs for inv_0 and inv_1, which are then considered to be requests from different clients by T. Worse, inv_0 and inv_1 may even be sent to different servers. Clearly, this leads to multiple executions and thus to an incorrect system state.