Browse Prior Art Database

Efficient Collection of Table Statistics

IP.com Disclosure Number: IPCOM000121263D
Original Publication Date: 1991-Aug-01
Included in the Prior Art Database: 2005-Apr-03
Document File: 1 page(s) / 41K

Publishing Venue

IBM

Related People

Comeau, A: AUTHOR [+2]

Abstract

Disclosed is a more efficient approach to collecting statistics describing data in relational database tables. This approach was implemented in Release 3 Version 1 of the Structured Query Language/Data System (SQL/DS*) relational database management system (RDBMS).

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 78% of the total text.

Efficient Collection of Table Statistics

      Disclosed is a more efficient approach to collecting
statistics describing data in relational database tables. This
approach was implemented in Release 3 Version 1 of the Structured
Query Language/Data System (SQL/DS*) relational database management
system (RDBMS).

      Most of the commercial RDBMs contain some access path
optimization.  The optimizer uses statistics about the data stored in
the database to determine the most efficient data access path.  The
efficiency of the access path is limited by the accuracy and
timeliness of the statistics because the statistics are not updated
with every single change to the data.

      In existing RDBMSs the statistics describing data stored in the
tables are gathered in a separate step which requires that the entire
table be scanned.  This is a very CPU- and I/O-intensive process,
especially for large tables. If data is loaded into a table and the
statistics are collected subsequently, then each row is visited twice
- once when storing the row into its database page and the second
time when scanning the table for the purpose of gathering statistics.

      Here is the core of this disclosure:  By merging the two
processes and gathering the statistics WHILE the table is being
populated, the subsequent scan of the table is saved.  After
finishing the dataload, the statistic values in the system catalog
tables are updated in the same way as if the statistics were...