Browse Prior Art Database

Index Scan Input/Output Estimation Using Page Reference String Statistics

IP.com Disclosure Number: IPCOM000122775D
Original Publication Date: 1998-Jan-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 2 page(s) / 91K

Publishing Venue

IBM

Related People

Lohman, GM: AUTHOR [+2]

Abstract

This invention relates to a method for accurately estimating the number of pages that must be retrieved from disk into a finite buffer in memory when scanning all or a portion of an index. This is critical to estimating the performance of systems that rely heavily upon index scans, such as file management systems and data base management systems. For example, data base query optimizers use such estimates to choose automatically the cheapest access path for each file referenced by a given query because it is impractical to try to execute each query using all potential access paths. Although is would be possible to retain an observed number of I/Os to scan an index, this statistic would necessarily be peculiar to the available buffer space and the particular buffer management scheme used in the measurement.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Index Scan Input/Output Estimation Using Page Reference String Statistics

      This invention relates to a method for accurately estimating
the number of pages that must be retrieved from disk into a finite
buffer in memory when scanning all or a portion of an index.  This is
critical to estimating the performance of systems that rely heavily
upon index scans, such as file management systems and data base
management systems.  For example, data base query optimizers use such
estimates to choose automatically the cheapest access path for each
file referenced by a given query because it is impractical to try to
execute each query using all potential access paths.  Although is
would be possible to retain an observed number of I/Os to scan an
index, this statistic would necessarily be peculiar to the available
buffer space and the particular buffer management scheme used in the
measurement.  Statistics should be independent of the buffer size and
management scheme, but I/O estimates should be parametrizable by the
buffer size.

      The method estimates disk I/Os based upon periodically derived
from the actual sequence of data page references that would be
accessed in a complete traversal of the index, hereafter called the
reference string, rather that from an assumed underlying distribution
of key values.  A window moving over the reference string is used to
collect locality-of-reference counts for each page.  These counts are
used to derive only four statistics (besides the number of records N)
that characterize locality of reference for the entire index and are
stored in the system catalogs.  The statistics can be used to
estimate the disk I/Os for a complete or partial scan of the index
for any given  buffer size.

      Given a buffer of size B pages dedicated to a complete or
partial scan of an index, the number of disk I/Os required is
calculated from the window statistics by modeling the index scan as
three concurrent  processes, each referencing pages from distinct
sets of pages:
            the index retrieval process generating the index
            references,
            the local data retrieval process generating the
            local references, and
            The nonlocal data retrieval process generating
            the nonlocal references

      The I/O estimation can easily be adjusted for implementations
that dedicate separate buffers for the index process.

      If the buffer is big enough to hold all the pages that are
accessed wh...