Browse Prior Art Database

Hidden Field for Storing Bucket Number in Parallel Database Tables

IP.com Disclosure Number: IPCOM000105526D
Original Publication Date: 1993-Aug-01
Included in the Prior Art Database: 2005-Mar-20
Document File: 2 page(s) / 58K

Publishing Venue

IBM

Related People

Edwards, WM: AUTHOR [+3]

Abstract

When redistributing data in Parallel Database System (PDB), the unit of redistributed data is a bucket. Each data record is assigned to a bucket by hashing the partition key of the data record. Because bucket is not physically supported in most relational database systems, the data redistribution program has to run hashing on each data record to determine if it belongs to the bucket to be moved. Although hashing can be done in a very efficient way, fetching every data record in a table for hashing could be very expensive. When a PDB table is used for multimedia data storage, there is a good chance that each data record is very long. In this case, heavy disk I/O will push the operational cost go up drastically. A design that can eliminate unnecessary data record fetch is needed.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Hidden Field for Storing Bucket Number in Parallel Database Tables

      When redistributing data in Parallel Database System (PDB), the
unit of redistributed data is a bucket.  Each data record is assigned
to a bucket by hashing the partition key of the data record.  Because
bucket is not physically supported in most relational database
systems, the data redistribution program has to run hashing on each
data record to determine if it belongs to the bucket to be moved.
Although hashing can be done in a very efficient way, fetching every
data record in a table for hashing could be very expensive.  When a
PDB table is used for multimedia data storage, there is a good chance
that each data record is very long.  In this case, heavy disk I/O
will push the operational cost go up drastically.  A design that can
eliminate unnecessary data record fetch is needed.  Instead of
running hashing each time a PDB table is included in data
redistribution, this disclosure proposes to add a hidden field in
each PDB table for storing bucket number for improving data
redistribution performance.

      Before a record is inserted into a PDB table, the bucket number
assigned to this record is determined by hashing on the record's
partition key value.  The bucket number assigned to a data record
remains unchanged as long as the record's partition key value is not
updated.  When a record's partition key value changes, it is treated
as deleting the original record and inserting a new record with
updated values.  Thus, the bucket number of a...