Browse Prior Art Database

On-Line Data Redistribution in a Shared-Nothing Parallel Database System

IP.com Disclosure Number: IPCOM000118391D
Original Publication Date: 1997-Jan-01
Included in the Prior Art Database: 2005-Apr-01
Document File: 4 page(s) / 165K

Publishing Venue

IBM

Related People

Baru, CK: AUTHOR [+3]

Abstract

Disclosed is a method to perform an on-line redistribution of data contained in a set of related relational tables such that the resulting distribution of data across nodes can be uniform or as specified by the user.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 34% of the total text.

On-Line Data Redistribution in a Shared-Nothing Parallel Database
System

      Disclosed is a method to perform an on-line redistribution of
data contained in a set of related relational tables such that the
resulting distribution of data across nodes can be uniform or as
specified by the user.

      This function is provided in the form of a database utility in
DB2 Parallel Edition, which is a Shared-Nothing (SN) parallel
database system.

      A SN parallel database system consists of a set of "nodes",
each associated with its own processing, storage, and communications
resources, across which databases are implemented.  Such systems
employ a partitioned storage model where data belonging to database
tables are partitioned across a specified subset of nodes using a
default or user-specified partitioning strategy.  It is desirable to
have a uniform distribution of data across the nodes so that the
system resources at each node are equally utilized, thereby resulting
in optimal use of the parallel database system.

      This disclosure describes an on-line function which can be used
to redistribute data contained in all tables defined in a given
nodegroup.  The resulting distribution of data across nodes can be
uniform or as specified by the user.  This function is provided in
the form of a database utility in DB2 Parallel Edition, which is a SN
parallel database system.

      The solution is provided in the context of a SN parallel
database system that supports "indirect" partitioning where a
database table is divided into partitions and the partitions are
mapped to a subset of nodes in the SN system.  An on-line data
redistribution function is provided which allows users to modify the
distribution of data belonging to the set of tables defined in a
nodegroup in the SN parallel database system.

      The data redistribution function is applied to a nodegroup,
i.e., it operates on all the tables defined in a nodegroup.  The
redistribution function can be used to (i) balance data volumes or
processing loads across nodes, (ii) distribute data according to a
user specified distribution, (iii) increase the number of nodes
across which data are partitioned, (iv) decrease the number of nodes
across which data are partitioned, (v) rollback a previously stopped
redistribution operation, and (vi) continue a previously stopped
redistribution operation.

      Each nodegroup is associated with a "Partitioning Map (PM)"
which specifies the mapping of partitions to nodes in the nodegroup.
The map that describes the current mapping of partitions is referred
to as  the "source partitioning map".  Based on the type of
redistribution selected, the data redistribution function either
generates a "target partitioning map" or uses the one provided by the
user.  Data redistribution is achieved by moving table partitions
among nodes. The  operation consists of moving partitions from their
current locations (nodes) as spe...