Browse Prior Art Database

Using Hash Filters to Improve the Execution of Multi-Join Queries Using Sort-Merge Joins

IP.com Disclosure Number: IPCOM000113158D
Original Publication Date: 1994-Jul-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 4 page(s) / 120K

Publishing Venue

IBM

Related People

Chen, M-S: AUTHOR [+3]

Abstract

In relational database systems, joins are the most expensive operations to execute, especially with the increasing database sizes. Proposed is an approach of using hash filters (HF's) for the execution of multi-join queries in a multiprocessor-based database system to minimize the join query execution time. It has been shown that the cost of executing a join operation can mainly be expressed in terms of the cardinalities of relations involved. In view of this, one would naturally like to remove unnecessary tuples and reduce the cardinalities of relations involved before a join to minimize the join cost.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 51% of the total text.

Using Hash Filters to Improve the Execution of Multi-Join Queries
Using Sort-Merge Joins

      In relational database systems, joins are the most expensive
operations to execute, especially with the increasing database sizes.
Proposed is an approach of using hash filters (HF's) for the
execution of multi-join queries in a multiprocessor-based database
system to minimize the join query execution time.  It has been shown
that the cost of executing a join operation can mainly be expressed
in terms of the cardinalities of relations involved.  In view of
this, one would naturally like to remove unnecessary tuples and
reduce the cardinalities of relations involved before a join to
minimize the join cost.  As semijoin has traditionally been relied
upon to reduce the amount of inter-site data transmission required
for distributed query processing, the technique of hash filtering can
be applied in a parallel database environment to reduce the relation
cardinalities.  A query plan is usually compiled into a tree of
operators, called join sequence execution tree, where a leaf node
represents an input relation and an internal node represents the
resulting relation from joining the two relations associated with its
two child nodes.  For exploring inter-operator parallelism, there are
three categories of query trees:  left-deep trees, right-deep trees,
and bushy trees.  The bushy tree, without being restricted to a
linear form, is of more flexibility on query plan generation, and
able to obtain a better query execution plan than a linear tree can
at the cost of searching in an ampler space.  However, due to its
exponential complexity, there is no efficient method for determining
hash filters for a bushy execution tree reported in literature thus
far.  To remedy this, we propose in this disclosure a scheme to
develop a sequence of hash filters to reduce the join execution cost
of a bushy tree.  The join method considered is the sort-merge join
which most existing database management softwares employ.  The system
model assumed is a multiprocessor system with distributed memory and
shared disks.

      A hash filter built by relation R[i]  on its attribute A,
denoted by HF[R[i]](A), is an array of bits which are initialized to
0's.  Let R[i](A) be the set of distinct values of attribute A in

R[i], and h be the corresponding hash function employed.  The k-th
bit of HF[R[i]](A) is set to be one if there exists an a &memberof.
R[i]  (A) such that h(a)=k.  Similar to the effect of semijoins, it
can be seen that before joining R[i] and R[j]  on their common
attribute A, probing the tuples of R[j]  against HF[R[i]](A) and
removing non-matching tuples will reduce the number of tuples of R[j]
to participate in the join.  The join cost is thus reduced.  We
propose a scheme to develop an effective sequence of hash filters to
reduce the join execution cost of a bushy executio...