Browse Prior Art Database

Management of Logical and Physical Data Failures

IP.com Disclosure Number: IPCOM000050717D
Original Publication Date: 1982-Dec-01
Included in the Prior Art Database: 2005-Feb-10
Document File: 3 page(s) / 15K

Publishing Venue

IBM

Related People

Crus, RA: AUTHOR [+3]

Abstract

This article describes a method for persistently maintaining knowledge of write errors and of read errors during Data Base processing in order to assure data integrity where a physical error is intermittent. Logically inconsistent data is detected, and further normal access inhibited until the inconsistent data is repaired. Information stored in a Data Base can become unavailable because of logical inconsistencies in the stored data format (i.e., bad pointers) or because of physical failures of the supporting media. It is desirable to limit the amount of data which becomes unavailable because of the above reasons.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 56% of the total text.

Page 1 of 3

Management of Logical and Physical Data Failures

This article describes a method for persistently maintaining knowledge of write errors and of read errors during Data Base processing in order to assure data integrity where a physical error is intermittent. Logically inconsistent data is detected, and further normal access inhibited until the inconsistent data is repaired. Information stored in a Data Base can become unavailable because of logical inconsistencies in the stored data format (i.e., bad pointers) or because of physical failures of the supporting media. It is desirable to limit the amount of data which becomes unavailable because of the above reasons. Herein, we describe techniques which make it possible to reduce the granularity of unavailable data to the (logical) page level, where a page is a logically contiguous fixed length (for instance, 4K) string, supported by one or more physical blocks in secondary storage. This requires definition of mechanisms for:

1. detecting data failures,

2. preventing access to pages affected by failures,

3. tracking pages affected by failures, and

4. recovering pages affected by failures.

DETECTING LOGICAL DATA FAILURES

Logical failures can originate in two ways:

(1) Incorrect data may be introduced within the data base

management subsystem (DBMS). To detect such data, checks

on the format of stored data are incorporated in the DBMS

code. Whenever one of such checks fails, a flag (the

Broken Bit) is raised in the header(s) of the page(s)

containing data which has been found to be incorrect, and a

Transaction Abort is started.

(2) An exception can occur while the data stored in a page

buffer is temporarily inconsistent because of internal data

manipulations which are not logged. For instance, an

exception may occur during free space recovery within a

page. Since free space recovery is not logged, the normal

"undo" techniques of Transaction Abort cannot be applied to

clean up the page. Also, the buffer containing the page

cannot just be discarded since it may contain committed

updates from previous transactions. To solve this problem

the Broken Bit in the page header is raised by the DBMS

code anytime it enters "critical sections" of code, namely

sequences of instructions during Which a page in the

buffers is made temporarily inconsistent. The Broken Bit

is reset when leaving the critical section. Should an

exception occur while the page is inconsistent, the Broken

Bit is left on.

PREVENTING ACCESS TO LOGICALLY FAILED PAGES.

The Broken Bit in the page header is always checked by the DBMS routines before accessing data contained in the page. If a "Normal

1

Page 2 of 3

Processing" (as opposed to Recovery) routine detects that the Broken Bit is on, then an error code of "data unavailable" is returned to the DBMS user. If a DBMS recovery (Abort or Restart) routine detects that the Broken Bit is on a page, then it does not attempt to apply logged chang...