Browse Prior Art Database

Method and apparatus for maintaining and recording changed states of data between checkpointing/recovery events

IP.com Disclosure Number: IPCOM000234646D
Publication Date: 2014-Jan-24
Document File: 2 page(s) / 50K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is per-block check-bit hardware that tracks whether a fixed-size block has been modified between the two checkpoints in memory.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 56% of the total text.

Page 01 of 2

Method and apparatus for maintaining and recording changed states of data between checkpointing /

/recovery events

recovery events

High-performance computing often uses checkpointing as a fault -tolerance mechanism.

At runtime, the application, checkpoint library or operating system (OS) periodically makes checkpoints by copying states of the application data to a safer , less- or non-volatile storage that is outside the current failure domain (e.g., from Dynamic Random Access Memory (DRAM) to disk or flash storage, or DRAM on other nodes). Upon an error, application states can be reconstructed and recovered from the previous checkpoints, sometimes by combining several checkpoints and incremental checkpoints. An incremental checkpoint only copies states that are different from prior checkpoints necessary for future use.

Conventional methods either checkpoint the whole data or , for better efficiency, only user-specified data and/or only modified data from the last checkpoints (i.e., incremental checkpointing).

The novel contribution is per-block check-bit hardware that tracks whether a fixed-size block that has been modified between the two checkpoints in memory . Each check bit corresponds to a fixed or variable-sized block of memory and the sizing is not restricted to virtual memory page sizes or cache line sizes (it can be smaller or larger). The check bits are located in main or dedicated memory and can be set and unset individually or en mass by hardware or...