Browse Prior Art Database

Using a Surrogate Median to Speed Up the Execution of a Parallel Sort Merge Join Algorithm

IP.com Disclosure Number: IPCOM000119629D
Original Publication Date: 1991-Feb-01
Included in the Prior Art Database: 2005-Apr-02
Document File: 3 page(s) / 132K

Publishing Venue

IBM

Related People

Wolf, JL: AUTHOR [+3]

Abstract

Disclosed is an improvement to the parallel sort merge join algorithm (1). The original algorithm was designed to effectively handle parallel joins in the presence of data skew. The improvement employs a shortcut which significantly reduces the execution time of the original algorithm, without hampering the effectiveness of the algorithm in any material way.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 50% of the total text.

Using a Surrogate Median to Speed Up the Execution of a Parallel
Sort Merge Join Algorithm

      Disclosed is an improvement to the parallel sort merge
join algorithm (1).  The original algorithm was designed to
effectively handle parallel joins in the presence of data skew. The
improvement employs a shortcut which significantly reduces the
execution time of the original algorithm, without hampering the
effectiveness of the algorithm in any material way.

      Following is a brief description of the original algorithm in
(1).  The method assumes initially that the two relations to be
joined, R1 and R2, are partitioned among the P processors. In the
sort phase of the algorithm, each of the processors sorts its
partition of each of relations R1 and R2 (using a conventional
external sorting method, such as a tournament tree sort and merging
of sorted runs).  The next and crucial phase of the method is the
scheduling phase.  Essentially, in this phase each of the relations
R1 and R2 are partitioned according to ranges of values of the join
column value, and each such partition is assigned to a task or a set
of tasks that will perform the join of values in the corresponding
range during the join phase. Further, the scheduling phase allocates
the tasks to processors so as to balance the processing time of the
join phase among the P processors.  In the transfer phase (for the
data partitioning architecture only) ranges of the two relations
corresponding to the allocation of tasks are shipped to the
appropriate processors.  Finally, in the join phase the allocated
tasks carry out the join operation of the corresponding ranges of the
relations. See (1) for further details.

      The original scheduling algorithm employed a variant, labeled
GM, of the selection problem technique due to (2), in order to find
the median element in a set of 2P sorted columns.  P of these columns
correspond to ranges of elements in the sorted runs of the the first
relation, and the other P correspond to ranges in the second
relation. The selection problem algorithm GM was performed
iteratively together with a minimum makespan algorithm in an effort
to "divide and conquer" the overall scheduling problem. Roughly
speaking, the selection problem algorithm GM did the "dividing" of
one task into three tasks, and the median element was picked to make
the first and third tasks have approximately equal task times. (The
second task was of a different nature.)  This equal task time goal
was reasonable, but by no means required or, for that matter,
guaranteed.

      The selection problem algorithm is itself an iterative
algorithm.  Fig. 1 shows 2P sorted columns.  (In other words, the
columns along each vertical arrow are in, say, non-decreasing order.)
In the first step of the algorithm, one finds the median of each
sorted column.  In Fig. 1, these medians lie on the horizontal arrow
through the center of the figure.  The next step of the algorithm
co...