Browse Prior Art Database

Intelligent selection of logs required during recovery processing

IP.com Disclosure Number: IPCOM000016114D
Original Publication Date: 2002-Sep-16
Included in the Prior Art Database: 2003-Jun-21
Document File: 3 page(s) / 48K

Publishing Venue

IBM

Abstract

Disclosed is a process for improving the speed of restoring a shared database by only replaying the log from the systems which have updated the data base since the previous backup. The IBM* Websphere* MQ* product (MQ), data base managers and other programs are increasingly allowing concurrent shared access to recoverable resources from multiple instances of queue/database managers. This is done to provide increase capacity, reliability, availability. If there should be a failure of the underlying data storage mechanism then to recover the data it is necessary to restore a backup copy of the data and to replay the logs of each of the queue/database managers to restore the data to its state immediately prior to the failure. Performance studies have shown that the log replay volume is the most significant factor in achieving a fast recovery from failure and so reducing the amount of log data that needs to be replayed is desirable. All though in a data sharing every instance of a queue/database manager has the potential for making recoverable updates that need to be replayed from the log in practice some instances may have been inactive whilst others may only may only have involved read-only or non-recoverable access to the data or have made all of their updates to portions of the data that have not failed. What is needed is means to identify which queue/database manager instances are likely to have made updates to the failed portion of the data so that when a recovery is needed we only need to replay the appropriate subset of the logs. However it is important to realize that storage media failure is a rare event and therefore the overhead of collecting the information about which logs will be needed during a potential replay should not impose a significant burden on the normal operation of the queue/database manager.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 45% of the total text.

Page 1 of 3

Intelligent selection of logs required during recovery processing

Disclosed is a process for improving the speed of restoring a shared database by only replaying the log from the systems which have updated the data base since the previous backup.

     The IBM* Websphere* MQ* product (MQ), data base managers and other programs are increasingly allowing concurrent shared access to recoverable resources from multiple instances of queue/database managers. This is done to provide increase capacity, reliability, availability.

     If there should be a failure of the underlying data storage mechanism then to recover the data it is necessary to restore a backup copy of the data and to replay the logs of each of the queue/database managers to restore the data to its state immediately prior to the failure.

     Performance studies have shown that the log replay volume is the most significant factor in achieving a fast recovery from failure and so reducing the amount of log data that needs to be replayed is desirable. All though in a data sharing every instance of a queue/database manager has the potential for making recoverable updates that need to be replayed from the log in practice some instances may have been inactive whilst others may only may only have involved read-only or non-recoverable access to the data or have made all of their updates to portions of the data that have not failed. What is needed is means to identify which queue/database manager instances are likely to have made updates to the failed portion of the data so that when a recovery is needed we only need to replay the appropriate subset of the logs. However it is important to realize that storage media failure is a rare event and therefore the overhead of collecting the information about which logs will be needed during a potential replay should not impose a significant burden on the normal operation of the queue/database manager.

     This disclosure describes a technique that is applicable to IBM Websphere MQ for z/OS* shared queue support using Coupling Facility structure and DB2* but which could be adapted for use by other queue/database managers. MQ has a number of storage mechanisms available to it which have different persistence and update cost attributes:

Application CF structure An application structure is the data repository that we wish to be able to recover if it should fail. Obviously we can't use information stored within the application structure to control its own restore. Within the application structure there is a List Header Interest map (LHIM) for each QMGR showing which list headers (queues) that QMGR is currently using. We will add to the LHIM record a new recoverable list header map. A queue manager will set a flag in the map when it first writes a log record for a list header and resets the flag when it stops using the list header. Thus if the entire map is empty the QMGR has no current interest in the structure for structure recovery purposes.

Administration CF...