Browse Prior Art Database

Algorithm to Find Beneficial Semijoins From a Join Sequence

IP.com Disclosure Number: IPCOM000119715D
Original Publication Date: 1991-Feb-01
Included in the Prior Art Database: 2005-Apr-02
Document File: 3 page(s) / 122K

Publishing Venue

IBM

Related People

Chen, MS: AUTHOR [+2]

Abstract

This article describes an algorithm to find beneficial semijoins, which include both profitable and gainful semijoins, from a join sequence for distributed query processing. In light of the properties of beneficial semijoins, this algorithm exploits the concept of the reducible set of each semijoin and is of polynomial execution time.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 51% of the total text.

Algorithm to Find Beneficial Semijoins From a Join Sequence

      This article describes an algorithm to find beneficial
semijoins, which include both profitable and gainful semijoins, from
a join sequence for distributed query processing.  In light of the
properties of beneficial semijoins, this algorithm exploits the
concept of the reducible set of each semijoin and is of polynomial
execution time.

      In a distributed database environment, semijoin has
traditionally been relied upon to reduce the amount of inter-site
data transmission required for query processing. Specifically,
significant resort efforts  have focused on the application of
profitable semijoins to distributed query processing, where a
profitable semijoin is a semijoin whose cost is less than its
benefit.  It is shown that judiciously applying join operations as
reducers, in addition to semijoins, can lead to further reduction in
the data transmission cost *.  Due to the use of join reducers, the
usefulness of semijoins has to be re- evaluated.  Some semijoins,
though not profitable themselves, may benefit the execution of
subsequent join operations, and become profitable owing to the use of
join reducers.  Such a semijoin is termed a gainful semijoin [*].
Both profitable and gainful semijoins are called beneficial semijoins
in this article.  When joins are also used as reducers, as far as the
reduction of the amount of data transmission is concerned, we
naturally want to find beneficial semijoins, instead of profitable
semijoins only. The identification of beneficial semijoins closely
depends on the join sequence to be performed, which can be denoted by
the form of a join sequence tree.  In this article, we propose a fast
heuristic scheme to determine beneficial semijoins from a join
sequence tree.  Note that in the conventional distributed query
processing, all the remaining relations in the final phase are sent
to the site where the results are needed for a join operation.  This
corresponds to the case that the join sequence tree is a star, and is
thus a special case of our study.

      Let SMJ be a set of semijoins and JQ(SMJ) denote the
configuration of the query after the set of semijoins SMJ has been
performed.  Also, TRi is the subtree formed by Ri and its offspring
in the join sequence tree, and S(TRi) denotes the set of nodes in TRi
.  The weight of a relation Ri in the join sequence tree under the
configuration JQ(SMJ), denoted by W(Ri,JQ(SMJ), is defined as the
size of the resulting relation from joining all the relations in
S(TRi).  Note that W(Ri,JQ(SMJ) is the cost of sending the relation
resulting from joining those relations within TRi to the parent node
of Ri under the configuration JQ(SMJ).  In addition, given a join
sequence tree T, the set of reducible relations of a semijoin...