Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Collecting Bucket Index Statistical Data with Colocation Considered

IP.com Disclosure Number: IPCOM000104537D
Original Publication Date: 1993-May-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 2 page(s) / 97K

Publishing Venue

IBM

Related People

Hargis, SD: AUTHOR

Abstract

Disclosed is a method of collecting statistics on tuple access data in a distributed database system when a "Bucket Index" is used. A tuple is a record in a database; tuple access data refers to various statistics that may be kept about accesses to the database. A Bucket Index has been used, in distributed database systems, to archive and manage the physical storage location/node of database tuples; additionally the Bucket Index may be used to implement colocation. Since statistical information is kept at the bucket level and more than one table can use the same bucket map, bucket map statistics reflect access information for more than one table. As table management operations are performed, tuples may be moved from one physical storage node to another.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Collecting Bucket Index Statistical Data with Colocation Considered

      Disclosed is a method of collecting statistics on tuple access
data in a distributed database system when a "Bucket Index" is used.
A tuple is a record in a database; tuple access data refers to
various statistics that may be kept about accesses to the database.
A Bucket Index has been used, in distributed database systems, to
archive and manage the physical storage location/node of database
tuples; additionally the Bucket Index may be used to implement
colocation.  Since statistical information is kept at the bucket
level and more than one table can use the same bucket map, bucket map
statistics reflect access information for more than one table.  As
table management operations are performed, tuples may be moved from
one physical storage node to another.  There exists no way to
distinguish a particular table's statistics from another's.  Thus the
bucket level statistics must be transferred in whole or not at all
with each table maintenance operation.  The result is the creation
and propagation of inaccurate statistics through out the system.
Inaccurate statistics will, in turn, cause suboptimal database
reorganization strategies.

      The result of hashing the key of a database tuple can be used
as an index to the storage location for that tuple.  The collection
of these hash results, and a node identifier for the storage
location, is called a "bucket map." Bucket maps map ranges of hashed
tuple keys into the available storage nodes.

      "Bucket" is the term used to refer to the collection of tuples
whose hashed key values, thus storage nodes, are the same.  It is
possible, indeed planned, that several tuples will be stored in the
same bucket.  Of course, multiple buckets may be stored on the same
node.

      In a distributed database, where data may be stored on more
than one node, there is a need to know which node has stored any
given tuple.  A "Bucket Index" has been used to give this
functionality; additionally the Bucket Index may be used to implement
colocation.  "Buckets" within the Bucket Index are considered the
atomic unit of data when the data is reorganized; the reorganization
algorithms require statistical information on the access to the data
in order to optimize the reorganization.  The statistical information
is kept with the buckets.  Problems of inaccurate statistics (leading
to suboptimal data reorganization plans) occur when the Bucket Index
is also used to implement colocation.  Since statistical information
is kept at the bucket level and more than one table can use the same
bucket map, the statistics reflect access information for more than
one table.

      This problem is further complicated when a table switches to
another bucket map, is newly created and uses an existing bucket map,
or is dropped from the database.  Each of these actions will ca...