Browse Prior Art Database

Detecting and Preventing Partial Block Writes by Leveraging Storage Solutions

IP.com Disclosure Number: IPCOM000033306D
Original Publication Date: 2004-Dec-06
Included in the Prior Art Database: 2004-Dec-06
Document File: 2 page(s) / 55K

Publishing Venue

IBM

Abstract

The core idea is to use page size information by storage controller and replication appliances in order to check the page integrity before writing it to the local storage or propagating it to a remote site.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 44% of the total text.

Page 1 of 2

Detecting and Preventing Partial Block Writes by Leveraging Storage Solutions

Some applications, such as databases and file systems, transfer their data to and from devices (disks) in discrete blocks, which are called pages. The pages are buffered in memory. IBM* DB2**, for instance, supports various page sizes including 4k, 8k, 16K and 32k. The default size is 4k, and an administrator can change this value.

    In general, applications write each data page to the disk by an atomic write operation. The underlying layers (e.g. file system, operating system, or device driver) may "break" this page write into multiple IO requests suitable for their respective operations. If a disaster occurs during these partial writes, it will result in inconsistent data on the storage. This problem is known as the partial page write problem (also referred to as incomplete writes problem). Partial writes from the application perspective can result in inconsistent data being hardened.

    Some applications such as databases cannot recover from the partial page problem, and the entire database is marked corrupted. To understand why this happens, let us follow a specific example of a partial write in DB2 database, where only the first 5k of an 8k page have been written. Assume that the database manager was writing an 8K page to storage, when suddenly a server crash occurred, and only the first 5K of that page have been written.

    At restart time, the DB2 database manager knows the write did not complete, but it does not know whether the entire write buffer, none of the buffer, or some portion of it is on storage. The database manager reads the sectors from disk, receiving a page that contains parts of the new page and parts of the old page. This is a dangerous situation, because the header of the page, which describes where the records are located inside the page, no longer provides a valid description. For example, the header of the page contains a log sequence number (LSN) that indicates to the database manager which logs have yet to be applied. Since the LSN is in the header of the page, the page appears quite new, and the database manager believes that no changes are needed. However this is not the case, since the latter portion of the page does indeed require changes.

    In another example, the page is a leaf node in a B+Tree index, and the records may be out of order. If the database manager had the entire page as it was before the write, it would be able to apply the necessary log records. If the database manager had the entire page as it would be after the write, it would know to apply the necessary undo log records. A partial page, however, is an unrecoverable situation. In the best case, this leads to the entire database being marked as damaged, and a restore from backup is required. In the worst case, this leads to undetected data corruption.

    The partial page write data corruption problem is very rare, but when it happens, it can have a catastrophic effe...