Browse Prior Art Database

Hybrid Tunable Flush Lag for High Availability and Performance

IP.com Disclosure Number: IPCOM000012425D
Original Publication Date: 2003-May-07
Included in the Prior Art Database: 2003-May-07
Document File: 2 page(s) / 47K

Publishing Venue

IBM

Abstract

An algorithm is disclosed which accomplishes the following for a database system (e.g., a relational database): 1. Allow the user or system to restrict the amount of time needed for the database to become available again after a crash without significantly degrading performance. 2. Allow easy tuning, via a single parameter, of the system along a tradeoff continuum of maximum data availability (fastest restart after a crash) versus maximum performance (greatest transactional throughput). 3. Greatly reduce or eliminate throughput variance caused by checkpointing activity, such that checkpoint processing has little or no impact on throughput.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 2

Hybrid Tunable Flush Lag for High Availability and Performance

  The first problem this invention solves is that of keeping the amount of log needed to be processed in a crash recovery small (so that database restart is fast -- a high availability (HA) requirement) while at the same time avoiding flushing frequently dirtied pages to disk over and over (so that transaction processing is fast -- a performance requirement).

    The second problem this invention solves is that of allowing a database system to be easily configured via a single parameter for optimal high availability, optimal performance, or anywhere in between along an availability-performance tradeoff continuum, thus allowing a single product to meet the diverse needs of high availability customers, heavy OLTP customers, and performance benchmarks.

    The third problem solved by this invention is that of reduction or elimination of performance (throughput) degradation during checkpoint activity, because the invention amortizes the writing of dirty pages, that would otherwise have to be done during checkpoints, out over all transaction processing time.

    The invention solves these problems by using a hybrid dirty page flushing algorithm consisting of LRU-based fine-grained flushing and FIFO-based coarse-grained flushing, along with two user-configurable parameters:

INTERVAL_FREQUENCY -- the frequency of coarse-grained flush intervals (e.g., in number of seconds per interval, or number of log records generated per interval). This determines the grain-size of control. (INTERVAL_FREQUENCY could optionally be eliminated entirely by hard coding it to e.g., 30 seconds.)

MAX_REPLAY_INTERVALS (or M for short) -- the maximum number of coarse-grained flush intervals worth of log that are allowed to be replayed/rolled back as part of a crash recovery. This trades off transaction performance with recovery time at the granularity determined by INTERVAL_FREQUENCY.

    For purposes of discussion below, let "N" stand for the number of a particular coarse-grained flushing interval, which are numbered sequentially from 1.

    The flushing approach is as follows: during transaction processing, dirty pages are flushed at a fine grain (i.e., one page at a time as needed to satisfy page faults) based on an LRU (least-recently used) algorithm. This means that only pages that are not being dirtied frequently are flushed most of the time, greatly reducing the number of write I/Os and maximizing write cache hit ratios. This results in high transaction performance.

    At a coarse grain, database work is conceptually divided into numbered flush intervals according to the INTERVAL_FREQUENCY parameter. At the end of each interval N, all dirty buffers that were first dirtied in the historical interval numbered (N - M + 2) are collected and submitted to flusher processes to be flushed in the background during the next interval. The earliest such a page could have been dirtied was at the beginning of interval (N - M + 2),...