Method and System for Safe and Efficient Checkpointing Using Multiple Signatures
Original Publication Date: 1999-Nov-01
Included in the Prior Art Database: 2003-Jun-19
Method and System for Safe and Efficient Checkpointing Using Multiple Signatures Disclosed here in is a method for performing safe and efficient process checkpointing. The novelty disclosed here is about using several signature functions to detect the changes that occur to a process's state between consecutive checkpoints. The resulting benefits include the reduction of the amount of state that must be saved during each checkpoint, independence from hardware or operating systems, and efficiency. Rollback-recovery has been an established method for achieving high availability and reliability in database systems and others. The principal idea in this style of fault tolerance is to periodically save on stable storage a checkpoint that includes the state of a process, a set of cooperating processes, or a database, depending on the application at hand. If a failure occurs, the system will restart from a saved checkpoint and resume computation. Checkpointing on stable storage incurs performance and storage overheads. Thus, reducing these overheads is very important in any implementation effort.