Browse Prior Art Database

Processor Allocation and Hash Filtering for Parallel Execution of Hash Joins

IP.com Disclosure Number: IPCOM000114808D
Original Publication Date: 1995-Feb-01
Included in the Prior Art Database: 2005-Mar-29
Document File: 2 page(s) / 113K

Publishing Venue

IBM

Related People

Chen, M: AUTHOR [+3]

Abstract

Among various join methods, the hash join has been the focus of much research effort and reported to have performance superior to that of others, particularly because it presents an opportunity for pipelining. A pipeline of hash joins is composed of several stages, each of which is associated with one join operation that can be executed, in parallel, by several processors. Though pipelining has been shown to be very effective in reducing the query execution time, prior studies on pipelined hash joins have focused mainly on heuristic methods for query plan generation. Little effort was made to take processing power into consideration and optimize processor allocation. This disclosure deals with two important issues: processor allocation and the use of hash filters, to improve the parallel execution of hash joins.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 47% of the total text.

Processor Allocation and Hash Filtering for Parallel Execution of
Hash Joins

      Among various join methods, the hash join has been the focus of
much research effort and reported to have performance superior to
that of others, particularly because it presents an opportunity for
pipelining.  A pipeline of hash joins is composed of several stages,
each of which is associated with one join operation that can be
executed, in parallel, by several processors.  Though pipelining has
been shown to be very effective in reducing the query execution time,
prior studies on pipelined hash joins have focused mainly on
heuristic methods for query plan generation.  Little effort was made
to take processing power into consideration and optimize processor
allocation.  This disclosure deals with two important issues:
processor allocation and the use of hash filters, to improve the
parallel execution of hash joins.  To execute a given query,
processors should be allocated in such a way that the query execution
time is minimized.  Also, the technique of hash filtering can be
employed to further reduce the query execution time.  Note that to
exploit the opportunity of pipelining for hash join execution, one
would naturally like to identify and execute a group of hash joins in
the bushy tree in a way of pipelining.  However, it can be seen that
such regrouping of joins, while taking advantage of pipelining, makes
the execution dependency in the bushy tree intractable, which in turn
causes the problem of processor allocation much more complicated.  To
remedy this, we first devise a scheme to transform a bushy execution
tree to an allocation tree in which each node denotes a pipeline.
Then, using the concept of synchronous execution time, processors are
allocated to the nodes in the allocation tree in such a way that
inner relations in a pipeline can be made available approximately the
same time, thus solving the execution dependency for the parallel
execution of pipelined hash joins.  In addition, the approach of hash
filtering is investigated to improve the query execution time.

      To transform a bushy tree to an allocation tree, first identify
the groups of joins in the bushy tree that could be pipelined.  Then,
an allocation tree can be obtained from the original bushy tree by
merging each group of joins together.  Next, determine the number of
processors allocated to each node (pipeline) in the allocation tree
in the manner of top down.  Clearly, all processors are allocated to
the pipeline associated with the root in the allocation tree, since
it is the last pipeline to be performed.  Those processors allocated
to the pipeline on the root are then partitioned into several
clusters which are assigned to execute the pipelines associated with
the child nodes of the root in the allocation tree in such a way that
those pipelines can be completed approximately the same time.  The
above step for partitioning the processors for the root is...