Browse Prior Art Database

Optimal Processor Allocation for Pipelined Hash Joins

IP.com Disclosure Number: IPCOM000111180D
Original Publication Date: 1994-Feb-01
Included in the Prior Art Database: 2005-Mar-26
Document File: 4 page(s) / 112K

Publishing Venue

IBM

Related People

Chen, MS: AUTHOR [+2]

Abstract

A pipeline segment of hash joins is composed of many stages, each of which is associated with one join operation. In a multiprocessor-based database system, the execution of each stage (hash join) can be implemented by many processors to exploit parallelism. The whole segment is executed in a manner of pipelining through all its stages. Moreover, the execution of a pipeline segment can be divided into the table-building phase and the probing phase. In table-building phase, the hash tables of all the stages are built. After setting up the hash tables in the pipeline segment in the table building phase, the pipeline segment starts the probing phase, in which tuples from the outer relation go through the pipeline stages to generate the resulting tuples.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 42% of the total text.

Optimal Processor Allocation for Pipelined Hash Joins

      A pipeline segment of hash joins is composed of many stages,
each of which is associated with one join operation.  In a
multiprocessor-based database system, the execution of each stage
(hash join) can be implemented by many processors to exploit
parallelism.  The whole segment is executed in a manner of pipelining
through all its stages.  Moreover, the execution of a pipeline
segment can be divided into the table-building phase and the probing
phase.  In table-building phase, the hash tables of all the stages
are built.  After setting up the hash tables in the pipeline segment
in the table building phase, the pipeline segment starts the probing
phase, in which tuples from the outer relation go through the
pipeline stages to generate the resulting tuples.  The total
processing time for a pipeline segment can be expressed as the sum of
its processing times in the table-building and probing phases:

  TS = C sub <h> + max sub <all i> (TB sub i) +
              max sub <all i> (TP sub i)
        = C sub <h> + max sub <all i> ( <WB sub i> over <n sub i> ) +
              max sub <all i> ( <WP sub i> over <n sub i> )
where TB sub i and WB sub i are, respectively, the execution time and
the work at stage i.  TP sub i and WP sub i are defined similarly.  C
sub <h> is the fraction of processing time that is irrelevant to
processor allocation.

      After the relations in a pipeline segment are determined, C sub
h, WB sub i and WP sub i, 0 le i le k-1 is obtained, where k is the
number of stages.  Then, derive the number of nodes to be allocated
to each stage so that the processing time of the segment can be
minimized.  The task of allocating processing units to the stages is
referred to as processor allocation.  Processor allocation is subject
to the following three constraints:

Constraint I: The sum of nodes assigned to all stages has to be less
than or equal to the total number of nodes in the system.

Constraint II: The amount of memory allocated to each stage should be
enough to accommodate the hash table.

Constraint III: The number of nodes allocated to each stage has to be
an integer.

      This invention derives the optimal processor allocation for the
pipeline stages of a pipelined hash join in a multiprocessor-based
database system.  Three optimal processor allocation schemes, subject
to (a) Constraint I, (b) Constraints I and II, and (c) all the three
constraints, are developed.

      An allocation of processors for a segment of k stages is
denoted by a vector of k components lambda = (n sub 0, n sub 1, ...
,n sub <k-1>), where the i-th component n sub i is the number of
nodes allocated to stage i.  Let TB= max sub <all i> ( TB sub i )  =
max sub <all i> ( <WB sub i> over <n sub i> ) and TP= max sub <all i>
( TP sub i ) = max sub <all i> ( <WP sub i> over <n sub i>).  For
every allocation of processors lambda, we ca...