Method and apparatus for maintaining and recording changed states of data between checkpointing/recovery events
Publication Date: 2014-Jan-24
The IP.com Prior Art Database
Disclosed is per-block check-bit hardware that tracks whether a fixed-size block has been modified between the two checkpoints in memory.
Page 01 of 2
Method and apparatus for maintaining and recording changed states of data between checkpointing /
High-performance computing often uses checkpointing as a fault -tolerance mechanism.
At runtime, the application, checkpoint library or operating system (OS) periodically makes checkpoints by copying states of the application data to a safer , less- or non-volatile storage that is outside the current failure domain (e.g., from Dynamic Random Access Memory (DRAM) to disk or flash storage, or DRAM on other nodes). Upon an error, application states can be reconstructed and recovered from the previous checkpoints, sometimes by combining several checkpoints and incremental checkpoints. An incremental checkpoint only copies states that are different from prior checkpoints necessary for future use.
Conventional methods either checkpoint the whole data or , for better efficiency, only user-specified data and/or only modified data from the last checkpoints (i.e., incremental checkpointing).
The novel contribution is per-block check-bit hardware that tracks whether a fixed-size block that has been modified between the two checkpoints in memory . Each check bit corresponds to a fixed or variable-sized block of memory and the sizing is not restricted to virtual memory page sizes or cache line sizes (it can be smaller or larger). The check bits are located in main or dedicated memory and can be set and unset individually or en mass by hardware or...