Browse Prior Art Database

Dynamic Efficient Merge

IP.com Disclosure Number: IPCOM000100287D
Original Publication Date: 1990-Mar-01
Included in the Prior Art Database: 2005-Mar-15
Document File: 4 page(s) / 111K

Publishing Venue

IBM

Related People

Iyer, BR: AUTHOR [+2]

Abstract

This invention performs merge phase processing using a smaller number of files ("merge order") available for storing runs than the number of files ("merge order") available during the sort phase using an "optimal merge pattern" (*). In an "optimal merge pattern", at each merge step the smallest runs are used as input to the merge, except that an m-way merge order is used as input to the merge.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Dynamic Efficient Merge

       This invention performs merge phase processing using a
smaller number of files ("merge order") available for storing runs
than the number of files ("merge order") available during the sort
phase using an "optimal merge pattern" (*).  In an "optimal merge
pattern", at each merge step the smallest runs are used as input to
the merge, except that an m-way merge order is used as input to the
merge.

      Other constraints which may limit the merge order include the
maximum size of the sort tree, and the buffer space for prefetched
input and deferred output to/from the file(s).  The most restrictive
constraint is used to establish the merge order.

      This method assumes that where the merge order is established
as M during the Sort phase, the Merge phase will have available to it
at most 2 times M (2M) files to hold intermediate merge results. Sort
Phase Implications

      During the Sort phase the merge order limitation is determined,
the limit will be referred to as M.  This number will also be used to
specify the maximum number of files that will be used for
distributing the initial sort runs. If we assume that the input data
is input to the sort in random order, then the initial runs created
will be approximately of the same size.  We can, therefore, evenly
distribute the runs created during the sort phase across the M files,
that is, given the following:
$    n = number runs created by the sort phase, numbered 0 through
n-1
$    M = merge order as determined during sort phase
$    M = maximum number of files used to hold the n runs. If
          n < M then M = n.  Files are numbered 0 through M-1.

      The ith run is placed on file number REM(i,M), where REM(i,M)
returns the remainder after dividing i by M.  That is the first run
(i=0) is placed on file number zero, the second (i=1) on file number
one, etc.
      For M = 5, the runs are distributed as follows:
                     File number
                  0   1    2   3   4
 Run number:    0   1    2   3   4
                  5   6    7   8   9
                  10 ...

      Simply stated, the runs are placed on the files starting with
file number 0 in round-robin fashion.  If at the end of the sort
phase n < M, then M = n.

      Merge Phase Processing: Because any one or more of the
limitations effecting the merge order could have changed between when
the initial merge order M was determined and the merge phase was
started, the merge order is once again determined and is given the
value of m.  If m > M, then m = M.
      A number of dum...