Browse Prior Art Database

A Method for Controlling the Data Placement in a Parallel Database that Takes Into Consideration Data Skew and Different Processor Size

IP.com Disclosure Number: IPCOM000123317D
Original Publication Date: 1998-Sep-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 1 page(s) / 27K

Publishing Venue

IBM

Related People

Fecteau, G: AUTHOR [+2]

Abstract

1. Create a vector of range 0 to x (x is implementation defined) and assign a node of the parallel database to each entry.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 95% of the total text.

A Method for Controlling the Data Placement in a Parallel Database
that Takes Into Consideration Data Skew and Different Processor Size

    1.  Create a vector of range 0 to x (x is implementation
        defined) and assign a node of the parallel database
        to each entry.

    2.  Using a well known (hashing technique supported by prior
        art) technique, convert the partition key of a row to a
        logical number of range 0 to x and use this number as
        offset to the above vector to determine the node the row
        resides on.

          For each table (or group of table), define an
        appropriate vector that limit that table to the desired
        subset of nodes.  The vector can account for data skew
        (more rows hashing to one entry that the other) or for
        different processor size (more entries on the more
        powerful nodes).

          Provide both a database API (callable C program) and
        an extension of the SQL language (in the form of an SQL
        function) to analyze data and create the mapping vector.

    3.  Provide a database command to change the mapping vector
        and redistribute data as required (when changing the
        number of nodes in the database).

    4.  Providing an SQL function returning the vector number
        of a row (for analysis purpose) or th...