Browse Prior Art Database

Sorting Large Multi-Volume Datasets using Data Pipes

IP.com Disclosure Number: IPCOM000106770D
Original Publication Date: 1993-Dec-01
Included in the Prior Art Database: 2005-Mar-21
Document File: 2 page(s) / 67K

Publishing Venue

IBM

Related People

Bennett, BT: AUTHOR [+2]

Abstract

Disclosed is a method for sorting multiple-volume datasets by reading/writing from/to multiple input/output volumes simultaneously, allowing multiple sorts and/or merges to take place simultaneously by means of partitioning the work via key ranges obtained by sampling or other means, and maximizing overlap of sequential job step paths by using data pipes.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 53% of the total text.

Sorting Large Multi-Volume Datasets using Data Pipes

      Disclosed is a method for sorting multiple-volume datasets by
reading/writing from/to multiple input/output volumes simultaneously,
allowing multiple sorts and/or merges to take place simultaneously by
means of partitioning the work via key ranges obtained by sampling or
other means, and maximizing overlap of sequential job step paths by
using data pipes.

      Consider the following example: it is desired to sort a very
large datatset residing on four input tapes, and have the sorted
output written to four output tapes.  Using successive non-overlapped
sorting and merging steps, this could take an excessive amount of
time, since the input and output tapes have to be read and written
sequentially (one after the other), and the sort steps may use
additional tapes for working datasets.

      A means to reduce the elapsed time for sorting very large
multiple volume datasets as in the above example is as follows.
Mutiple concurrently executing instances of a simple SPLIT program
are used, each of which reads its input and splits it into multiple
output streams using specified key ranges, followed by multiple
concurrently executing sort and merge job steps.  The key ranges
should be chosen so as to approximately evenly split up the work.
This can be done by sampling or other known means.

      The input and output of the various concurrently executing job
steps are connected using data pipes.  A dat...