Browse Prior Art Database

Value-Oriented Approach to Selecting Buckets for Dat Redistribution

IP.com Disclosure Number: IPCOM000104671D
Original Publication Date: 1993-May-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 2 page(s) / 109K

Publishing Venue

IBM

Related People

Li, SG: AUTHOR

Abstract

Disclosed is a value-oriented approach for selecting buckets to move to the new nodes in data redistribution in a parallel database system (PDB). The goal is to achieve load balancing among all nodes with least impact to system performance as possible. When adding new nodes into a PDB, it is necessary to redistribute data to store data in the new nodes.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 50% of the total text.

Value-Oriented Approach to Selecting Buckets for Dat Redistribution

      Disclosed is a value-oriented approach for selecting buckets to
move to the new nodes in data redistribution in a parallel database
system (PDB).  The goal is to achieve load balancing among all nodes
with least impact to system performance as possible.  When adding new
nodes into a PDB, it is necessary to redistribute data to store data
in the new nodes.

      An intuitive and traditional approach for data redistribution
after adding new nodes is unloading all the data and then reloading
them as doing the initial loading with load balancing considered.
However, when data grows to a huge amount, it may not be so
justifiable to take this "initial loading" approach.  Since data are
clustered into buckets in the system, it is reasonable to select some
of the buckets in the existing nodes and move them to the new nodes,
without unloading and reloading the entire database.  As the data
redistribution takes time and it will impact the PDB's performance, a
method is needed to select the "right" buckets to move for achieving
both load balancing and early completion time.

      When moving a bucket from one node to another, there are
several costs worth special attention for their possible impact to
the PDB's performance.  Usually, the shorter the data redistribution
takes, the less impact to the performance.  Thus, the earliest
completion time is one of the ideal goals for selecting buckets for
data redistribution.  Another goal is to achieve load balancing among
all the nodes, including the new nodes, as much as possible.  Each
existing node has an ideal amount of work load to give up to reach
load balancing in the PDB system.  For an existing node, the work
load of a bucket can be considered as the "selling price" of this
bucket.  Within the range of the ideal amount to give up, the higher
a bucket's load the higher selling price it can generate.  The cost
of moving a bucket from the existing node to a new node is the
expense of selling that bucket.  This disclosure proposes a
value-oriented approach that selects buckets to move based on the
value generated by moving a bucket.

      Basically, the value of moving a bucket Bi in table T can be
roughly formulated as:

   (a1 * W(Bi)) - (a2 * Z(T) + a3 * Z(Bi) + a4 * R(Bi)),
where W(Bi) is the work load of buck Bi collected from status
monitor, Z(T) is the size of table T, Z(Bi) is the size of bucket Bi,
and R(Bi) is the number of data records belong to bucket Bi.  In
addition, a1, a2, a3, and a4 are the weighing factors that can be
predetermined based on:

1.  The emphasis of the value's definition -- price or cost.
2.  The study on the data redistribution's operational behavior.  It
    can help to determine the influence on completion time from the
    table size, the bucket size, and the number of records in the
    bucket respectively.

      When a full table scan method is...