Browse Prior Art Database

Methods for Application Checkpointing using Application Dependence Analysis

IP.com Disclosure Number: IPCOM000222538D
Publication Date: 2012-Oct-16
Document File: 2 page(s) / 34K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method to use program dependence analysis, traditionally performed by compiler for parallelizing optimizations, to reduce the size and bandwidth of incremental checkpointing.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 2

Methods for Application Checkpointing using Application Dependence Analysis

High-Performance Computing often uses checkpointing as a fault-tolerance mechanism. At runtime, the application, checkpoint library, or operating system (OS) periodically makes checkpoints by copying states of the application data to a safer, less- or non-volatile storage that is outside the current failure domain (e.g., from Dynamic Random Access Memory (DRAM) to disk or flash storage, or DRAM on other nodes). Upon an error, application states can be reconstructed and recovered from the previous checkpoints, sometimes by combining several checkpoints and incremental checkpoints. An incremental checkpoint only copies states that are different from prior checkpoints necessary for future use.

Conventional methods checkpoint the whole data or attempt to improve efficiency by checkpointing only user-specified data and/or modified data from the last checkpoints


(i.e., incremental checkpointing). The disclosed algorithm only checkpoints data that is written and will possibly be used in the future; therefore, checkpointing is more efficient because it copies a smaller subset compared to the prior solutions.

The invention uses program dependence analysis, traditionally performed by compiler for parallelizing optimizations, to reduce the size and bandwidth of incremental checkpointing. By computing the read set and the write set of chosen points in a given loop, the algorithm computes the checkpoint set based on the intersection of the read and write set. Because prior solution checkpoints the write set and the disclosed...