Browse Prior Art Database

Data Dependent External Merge Optimization

IP.com Disclosure Number: IPCOM000085104D
Original Publication Date: 1976-Feb-01
Included in the Prior Art Database: 2005-Mar-02
Document File: 4 page(s) / 127K

Publishing Venue

IBM

Related People

Thompson, KC: AUTHOR

Abstract

A method is provided for determining when external merge optimization can be used.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 4

Data Dependent External Merge Optimization

A method is provided for determining when external merge optimization can be used.

The basic problem of merge optimization has been to select an order of merge that minimizes the total merge time. This is not automatically the largest merge order possible, because a large merge order decreases block sizes and increases the reads and writes necessary. Also, if the merge order is too small, the number of passes (i.e., number of times the data is read and written) increases, thus increasing merge time.

In many cases, a design with merge order of N (N > 2) and with one output buffer will be optimal for a given set of data characteristics. Yet for a different quantity of data (i.e., different number of initial strings), a merge order of N - 1 with two output buffers may produce an optimal merge. This creates the problem of which order of merge should be used for the initial merge design when only the data characteristics are known.

The following technique is applicable when the external storage is a direct access storage device.

In this instance, N is the order of merge for the external merge and M is the order of merge for the last or output pass. Both N and M are considered to be the "best" orders of merge for the existing hardware and data characteristics.

There are ranges where merge orders N and N - 1 overlap (i.e., both require the same number of merge passes). Figs. 1, 2, and 3 illustrate the overlap between N and N - 1 for various N and M. The calculations to be presented and the data presented in the figures assumes that the output pass is invoked when M or fewer strings remain.

The overlap range for any k (k = number of passes, k > 1) is from MN/k - 2/ +1 to M(N - 1)/k-1/.

An equivalent method of determining overlap is to determine k for N and for N - 1 based on the number of strings. If the k's are equal, then the string count is within an overlap range.

The number of initial strings, created by the internal sort, need not be in an overl...