SMARTFLUSH: Method to Increase Update Concurrency in a Disk-Based Database System Which Uses Checkpoints by Avoiding Multiple Flushing of Hot Dirty Data Pages to Disk.
Original Publication Date: 2005-Oct-21
Included in the Prior Art Database: 2005-Oct-21
Disclosed is a method to increase update concurrency in a disk-based database system that uses hard checkpoints by reducing the number of multiple flushes from memory to disk of updated database pages between checkpoints.
SMARTFLUSH: Method to Increase Update Concurrency in a Disk -Based Database System Which Uses Checkpoints by Avoiding Multiple Flushing of Hot Dirty Data Pages to Disk.
Some disk-based, computer database systems use periodic synchronization points, called "hard checkpoints", to support the fast recovery of updated data if the system needs to be restarted, for example, following a system failure. These hard checkpoints require stopping all data updates for a short duration while data which have been updated in volatile computer memory only are written from memory to disk, thus capturing on persistent storage a set of consistent data from a particular point in time. Typically the data is organized into "pages" in memory and on disk, and we refer to the pages which have been updated in computer memory only as "dirty pages". In addition we refer to pages which are updated with a high frequency "hot pages".
Once checkpoint processing is underway, database systems that use checkpoints as described above tend to block update access to all data pages for the length of time it takes to write all the dirty pages to disk. Database systems may reduce this block time by using two phases for the checkpoint operation, where the first phase involves a "soft flush" and the second phase involves a "hard flush". A soft flush is an attempt to write out as many of the dirty pages as possible to disk without blocking updates for the entire duration of the flush. The soft flush blocks updates to each dirty page written only for the time to flush the concerned page. A hard flush blocks updates to all pages for the entire duration of the flush. This prior art of optimizing the checkpoint operation tries to take advantage of the fact that a majority of the dirty page flushing can be completed in the first "soft flush" leaving relatively few dirty pages to be flushed in the second "hard flush" phase. Typically, however, hot pages in this two-phase flush scenario are updated again after being written by the soft flush, and inadvertently require flushing more than once, thereby hurting concurrency in the first phase by unnecessarily utilizing limited computer system resources and blocking updates to these hot pages while they are being written.
This disclosure describes a method which uses a learning algorithm to adjust the operation of the soft flush to avoid or reduce the multiple flushing of hot, dirty pages during the checkpoint processing.
The method is as follows. In the first phase a soft flush is performed. The soft flush can consist of...