Browse Prior Art Database

Recursive Binary Partitioning Algorithm for Database Access Skew Characterization

IP.com Disclosure Number: IPCOM000105655D
Original Publication Date: 1993-Aug-01
Included in the Prior Art Database: 2005-Mar-20
Document File: 6 page(s) / 233K

Publishing Venue

IBM

Related People

Chung, JY: AUTHOR [+3]

Abstract

Disclosed is an efficient recursive binary partitioning algorithm for database access skew characterization from a database access trace. The access skew characterization is useful for analytical prediction of buffer hit probability.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 20% of the total text.

Recursive Binary Partitioning Algorithm for Database Access Skew Characterization

      Disclosed is an efficient recursive binary partitioning
algorithm for database access skew characterization from a database
access trace.  The access skew characterization is useful for
analytical prediction of buffer hit probability.

      The characterization of database accesses from reference traces
for the purpose of buffer hit prediction is extremely useful for
workload management and system capacity planning in various ways.
The knowledge can be helpful for the assignment of various database
relations to buffer pools as well as the allocation of buffer space
among the multiple buffer pools.  In this disclosure, we propose a
concise way to characterize the access skew across the randomly
accessed pages.  A recursive binary partitioning algorithm is
developed that can infer the access skew characterization from the
buffer hit probabilities for a subset of the buffer sizes.

      Three types of access pattern can be distinguished from a
trace:  1) locality within a transaction, 2) random accesses by
transactions and 3) sequential accesses by long queries.  In this
disclosure, we will describe an algorithm that infers the access
skew, given the random access component of database access trace.
The random access component can be obtained by filtering out the
sequential access component based on run length and the rereferencing
within a transaction from a reference trace.

      First, obtain the buffer hit probability for the random access
component vs buffer size curve under the LRU policy through trace
driven simulation.  Let B sub j , j=1 ellipsis N be the set of buffer
sizes for which buffer hit probabilities, h sup sim sub j , j=1
ellipsis N, are evaluated.  This can be done in a single pass of the
trace [2].  The buffer sizes selected are somewhat arbitrary, but
they should reflect the range of buffer sizes for which we are
interested in predicting the buffer hit probability.  As mentioned in
[1], the access skew can be characterized by logically grouping the
pages into a smaller number of disjoint partitions such that the
access frequencies to pages within a partition can be treated as
equal.  Assume that a particular database needs to be divided into K
partitions for a desired accuracy in the prediction of buffer hit
probability.  Let alpha sub i be the frequency that an access will go
to partition i, and let D sub i be the number of pages in that
partition.  Then the access skew is characterized by [(alpha sub i,D
sub i), i=1 ellipsis K ].  Let h sup ana sub j , j=1 ellipsis N  be
the predicted buffer hit probabilities based on the skew
characterization for the buffer sizes B sub j , j=1 ellipsis N under
the LRU policy.  We say that the skew is well characterized if the
absolute difference ( % |h sup sim sub j -h sup ana sub j | ) between
the buffer hit probabilities for any buffer size under the LRU
replacement...