Browse Prior Art Database

Sorting by Range Partitioning

IP.com Disclosure Number: IPCOM000075268D
Original Publication Date: 1971-Aug-01
Included in the Prior Art Database: 2005-Feb-24
Document File: 3 page(s) / 58K

Publishing Venue

IBM

Related People

Long, NE: AUTHOR

Abstract

The partition sorting technique described herein is the dual of the sorting by merging technique; both accomplish the same objective i.e., a sorted file.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 3

Sorting by Range Partitioning

The partition sorting technique described herein is the dual of the sorting by merging technique; both accomplish the same objective i.e., a sorted file.

The partition sort may be programmed as follows:

Phase 1 - Partition the input file into a given number of sets of unordered records (or smaller files), each set having its

records within a unique key range.

Phase 2 - Partition the sets successively into subsets within key subranges until the number of records in each subset is equal

to or less than a given maximum length.

Phase 3 - Sort the records in each subset onto the output file, beginning with the subset having records of the lowest (or

highest) key range and ending with the subset having records

of the highest (or lowest) key range.

Phases 1, 2 and 3, respectively, can be called the "initial split", "intermediate split", and "internal sort" (or "consolidation") phases.

The number of subsets in a partition of a set is called the "split order" (S), and is the equivalent of the "merge order" (M) in sorting by merging.

The split order (S) need not be constant throughout the split phases. For example, if the maximum split order is 8 (5=8), and the maximum subset size desired is 120 records, a 160 record subset need not be split into 8 subsets of approximately 20 records each using 5=8. Instead, the split order can be changed to S=2 to produce 2 subsets of approximately 80 records each, thus reducing the number of comparisons during the split. The partition of each set into subsets is accomplished by a range partition of the keys in each set, i.e. each subset contains only keys within a given range. See the example shown in Fig. 1.

Often it is unnecessary to store a list defining each range of keys for a partition. The key values can be used to compute the range containing the key via an address calculation.

This method of partitioning subsets into smaller ones is highly effective in cases where S is large.

The partition sorting method's efficiency is dependent on the distribution of key values.

The buffering in the partition sort is the dual of the buffering in sorting by merging. For example, 8 input and 2 output buffers in a sort by me...