Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Detection of missing log files in a merged log stream

IP.com Disclosure Number: IPCOM000218216D
Publication Date: 2012-May-28
Document File: 4 page(s) / 29K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a solution to guarantee a user that a system can detect any missing log file conditions that would result in data corruption during a log merge. The solution is comprised of three mechanisms that work together: log file linking, log file registration sync log record, and the Global File Array (GFA).

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 26% of the total text.

Page 01 of 4

Detection of missing log files in a merged log stream

The problem is most likely to appear in a database system , however could apply in general to other types of systems with similar characteristics .

Consider a database setup using shared data similar to DB 2 pureScale. Each member logs the actions performed on the data by that member to separate physical locations (also known as log streams). Each log stream consists of any number of log records divided among multiple log files . After the database is restored from a backup, the transaction logs must be replayed so that the database can be returned to the state it was in at the time the restore operation was performed. The replaying of the transaction logs following a database restore is referred to as a rollforward operation . In order to perform a rollforward operation and apply all actions against the database a merge of the physical log streams into one stream (logical or physical) must be done so that the actions can be replayed in the correct order against the database . Some method to uniquely order the log records among the streams must already exist , in our case this is an LFS value. An LFS value is unique across the cluster and when assigned at run time is done so at each log flush on each log stream .

Another assumption is that there is no mechanism in place to track all log files created and maintained throughout time, instead there is a predefined naming scheme (in this case they are numbered incrementally, S#######.LOG on each member) to the log files so when the database reads the logs it knows how to find the files . When reading a log stream, the user knows they have reached the end of the log stream when either they find an empty file or no more files exist in the sequence .

The problem that arises in this type of environment is being able to reliably merge logs from all the log streams, while ensuring not to corrupt the data in the event that some log files become missing. Log files might become missing due to any number of reasons, including user error such as moving log files around or setting up the log path incorrectly, file system error, or some other software error causing us to not find a log file. If a log file goes missing on a stream, the user is likely to think that the log stream has ended and continue merging data from the other log streams . Left undetected, this affects the integrity of the database . Data will be lost and go unreported , and corruption of the data can occur because some actions against pages in the database will be redone while others will not (e.g., if two log streams touch a given data page and only one stream is replayed, it can leave the page in an inconsistent or corrupted state).

Previous solutions to this problem include :


 Have a file containing something of a history of all log files in the database . The drawbacks with this solution is a single point of failure , if the history file is lost , corrupted, or does...