Browse Prior Art Database

Performing a Sort Merge Range Join in a Fixed Memory Buffer Without Rereads

IP.com Disclosure Number: IPCOM000105580D
Original Publication Date: 1993-Aug-01
Included in the Prior Art Database: 2005-Mar-20
Document File: 2 page(s) / 78K

Publishing Venue

IBM

Related People

Turek, JJ: AUTHOR [+3]

Abstract

Disclosed is a method to perform a sort merge range join in a fixed memory buffer without rereading any of the tuples, if this is possible. In a sort merge range join [1] of two relations on a single processor, both relations are first sorted and merged. Then one of the two relations is chosen as an inner relation, the other as an outer relation, and the relations are joined using an inner/outer loop join. In order for the join phase to proceed without rereading any of the tuples, it is essential at all times that the size of the relevant portion of the outer relation be less than that of the available memory buffer. In general range joins, particularly in the presence of data skew in both relations (so-called double skew [2]) this may not always be the case.

This text was extracted from an ASCII text file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Performing a

Sort

Merge

Range

Join in a Fixed Memory Buffer Without Rereads

      Disclosed is a method to perform a sort merge range join in a
fixed memory buffer without rereading any of the tuples, if this is
possible.  In a sort merge range join [1] of two relations on a
single processor, both relations are first sorted and merged.  Then
one of the two relations is chosen as an inner relation, the other as
an outer relation, and the relations are joined using an inner/outer
loop join.  In order for the join phase to proceed without rereading
any of the tuples, it is essential at all times that the size of the
relevant portion of the outer relation be less than that of the
available memory buffer.  In general range joins, particularly in the
presence of data skew in both relations (so-called double skew [2])
this may not always be the case.  The invention disclosed here
creates a mechanism whereby the identities of the inner and outer
relations can be switched on the fly as required, in order to
eliminate whenever possible the need to reread any of the tuples.
The invention can also be used to determine the minimum memory buffer
size required to perform the join phase of a sort merge range join
without rereads.  Since equijoins are a special case of range joins,
a simplified application of the invention could improve the
performance of such joins in the presence of double data skew.

      Let R sub 1 denote relation 1 and R sub 2 denote relation 2.
The invention partitions the join phase work into work occurring in
four types of regions.  These regions can be characterized as
follows.

o   In I regions R sub 1 is the inner relation.  Define pivot%points
    corresponding to changes in the values of R sub 2.  Regions
    change with left-to-right passage through pivot points.  See Fig.
    1.

o   In O regions R sub 1 is the outer relation.  Define pivot%points
    corresponding to changes in the values of R sub 1.  Regions
    change with top-to-bottom passage through pivot points.  See Fig.
    2.

o   In IO regions a transition is made from an I region to an O
    region.  See Fig. 3.

o   In OI regions a transition is made from an O region to an I
   ...