Browse Prior Art Database

Methods to Minimize Recovery Processing time in a Cluster Computing System Disclosure Number: IPCOM000218278D
Publication Date: 2012-May-31
Document File: 4 page(s) / 29K

Publishing Venue

The Prior Art Database


Disclosed are methods to minimize recovery processing time in a cluster computing system. The approach entails each host machine in the cluster computing system acting as a ‘home host’ for typically one cluster member, that which uses most of the memory resources on the host computer.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 32% of the total text.

Page 01 of 4

Methods to Minimize Recovery Processing time in a Cluster Computing System

A 'cluster' refers to a group of compute elements that work cooperatively to achieve some computational goal, such as providing a distributed DataBase Management System (DBMS).

Compute elements cooperating in such a cluster can be hardware (e.g., individual computer systems, host machines), a virtual machine (VM), or software components.

Compute elements in such systems are commonly referred to as nodes.

One embodiment of such a cluster is a shared data DBMS where each node is a member of a DBMS instance. Each member typically runs on its own host computer/operating system (OS) image/virtual machine (referred to as 'host computer' for the rest of this document). Shared storage access is provided through a Cluster File System or Network File System with the physical disk devices accessible on a Storage Controller, which is accessible through a Storage Area Network (SAN) to which each host machine has access.

When one or more host computers in a cluster fail , recovery needs to be performed on behalf of the members of the cluster that were running on the failed host computer . In one embodiment, such recovery consists of log-based redo and undo. Log records that were written by the member on the failed host computer are applied to bring changes to the shared resources that were not flushed to persistent storage , up to currency at the point of failure, as well as undo any uncommitted transactions. Such recovery is typically performed on another (surviving) host machine that is in the cluster.

In one embodiment, redoing log records requires that database pages to which the log records apply, need to be read from disk into main memory on the host computer where the recovery is being performed. Thus, some amount of software infrastructure required to do recovery needs to be initialized . This includes creating threads to redo/undo log records, allocating memory to hold database pages and log records , etc.

Prior art includes:

 Designs where the process or thread that is responsible for performing the recovery of a member whose host computer has failed belongs to another (healthy) member; that is, one that is resident on the surviving host computer where the recovery operations for the failed member are to be performed . While this allows for the performance of recovery on a host computer where some of the required infrastructure (e.g., memory, threads) is pre-allocated, it does not allow for isolation of cluster members. This could result in the problem of a software bug re-occurring when a certain actions (which are logged) are performed in a certain order. Such a bug may manifest itself when log records from the member being recovered are redone/undone, and could cause all of the members of the cluster computing system to fail, one after the other, as each


Page 02 of 4

  tries to replay the log records of the originally failed member .
 Designs where a s...