Browse Prior Art Database

Integrated Approach to Build Bushy Trees for the Execution of Pipelined Hash Joins

IP.com Disclosure Number: IPCOM000114059D
Original Publication Date: 1994-Nov-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 4 page(s) / 151K

Publishing Venue

IBM

Related People

Chen, M-S: AUTHOR [+2]

Abstract

An integrated approach to build bushy trees for the execution of of pipelined hash joins in a multiprocessor-based database system is proposed in this disclosure. The proposed algorithm builds a bushy tree based on a minimal-work heuristic first, and then iteratively refines the bushy tree in light of the effect of processor allocation. Due to the increases in database size and query complexity, parallelism has been recognized as the only solution for the efficient execution of multi-join queries for future database management. Among various join methods, the hash join has been the focus of much research effort and reported to have performance superior to that of others, particularly because it presents an opportunity for pipelining.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 29% of the total text.

Integrated Approach to Build Bushy Trees for the Execution of Pipelined
Hash Joins

      An integrated approach to build bushy trees for the execution
of of pipelined hash joins in a multiprocessor-based database system
is proposed in this disclosure.  The proposed algorithm builds a
bushy tree based on a minimal-work heuristic first, and then
iteratively refines the bushy tree in light of the effect of
processor allocation.  Due to the increases in database size and
query complexity, parallelism has been recognized as the only
solution for the efficient execution of multi-join queries for future
database management.  Among various join methods, the hash join has
been the focus of much research effort and reported to have
performance superior to that of others, particularly because it
presents an opportunity for pipelining.  Using hash joins, multiple
joins can be pipelined so that the early resulting tuples from a
join, before the whole join is completed, can be sent to the next
join for processing.  A pipeline of hash joins is composed of several
stages, each of which is associated with one join operation that can
be executed, in parallel, by several processors.  As far as the
structure of an execution tree is concerned, the bushy tree, without
being restricted to a linear form, is of more flexibility on query
plan generation, and able to obtain a better query execution plan
than a right-deep tree can.  However, for hash joins, the scheduling
of a bushy tree execution plan is much more complicated than that of
a right-deep tree.  As a result, though pipelining has been shown to
be very effective in reducing the query execution time, prior studies
on pipelining hash joins for bushy execution trees have focused on
either (1) heuristic methods for query plan generation or (2)
processor allocation to optimize the execution of a single pipeline
segment.  Little effort was made to develop an integrated approach
that takes these two issues into account and optimizes the bushy tree
generation and processor allocation as a whole.  Note that both
issues have significant impact to performance, and the correlation
between them complicates the design of such a scheme.  In view of the
increasing demand for better performance of database operations, an
integrated approach to build bushy trees for the execution of
pipelined hash joins is proposed in this invention.  The system
considered is a multiprocessor-based database system with distributed
memory and shared disks.

      The proposed algorithm builds a bushy tree based on the
heuristic of minimal work first, and then iteratively refines the
bushy tree in light of the effect of processor allocation.  The bushy
tree derived is composed of a collection of right-deep subtrees, each
of which is associated with a pipeline segment.  The pipeline
segments will be executed one by one bottom up with all processors in
the system implementing one segment at a time.  To schedule pipelined
...