Browse Prior Art Database

Exhibiting And Exploiting Array Processes And Parallelism in Data Base Operations

IP.com Disclosure Number: IPCOM000100248D
Original Publication Date: 1990-Mar-01
Included in the Prior Art Database: 2005-Mar-15
Document File: 2 page(s) / 96K

Publishing Venue

IBM

Related People

Mokski, DJ: AUTHOR

Abstract

This article consists of a strategy for organizing data base operations as sets of array processing tasks which can be performed in parallel. The applicable operations are those which pertain to data extractions, structure combinations, such as Joins, and data restructuring transformations in general. The first two are typical of the operations which underlie the Structured Query Language (SQL) of IBM data base products SQL/DS and Database 2. The restructuring transformations are typical of those which occur in the IBM data base transformation product (*).

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Exhibiting And Exploiting Array Processes And Parallelism in Data Base Operations

       This article consists of a strategy for organizing data
base operations as sets of array processing tasks which can be
performed in parallel.  The applicable operations are those which
pertain to data extractions, structure combinations, such as Joins,
and data restructuring transformations in general.  The first two are
typical of the operations which underlie the Structured Query
Language (SQL) of IBM data base products SQL/DS and Database 2.  The
restructuring transformations are typical of those which occur in the
IBM data base transformation product (*).

      The input structures to data base operations are essentially
mixed arrays of text and numbers or, in some cases, arrays of such
arrays.  The strategy is based on organizing the processing of such
arrays into a sequence of highly parallel subprocesses which pipeline
into each other to achieve additional parallelism.  Essentially, such
subprocesses consist of a repetition of the same (or similar)
function on a sequence of pairs of substructures. A primary
parallelism exists because the repetition of such a function
constitutes a set of subtasks which can be performed independently
and in parallel.  A secondary parallelism can occur when such
repeating functions cascade into one another, and the output stream
of one repeating function can be pipelined into the next repeating
function while the former is still processing.  This allows the
second function to begin its set of parallel subtasks before the
first has completed.

      In typical queries, e.g., as with SQL, selection qualification
expressions, such as "Supplier Number='S1'" AND "Quantity > 200", are
first evaluated against applicable arrays.  This step is followed by
array row-reduction operations, and often by Join operations to
concatenate arrays.  Data structure transformations also employ
select expressions, row reductions, and Join operations, but usually
operate on a larger scale (sometimes involving entire data bases).
The evaluation of the select expressions is probably the most visible
example of one instance of a primary parallelism opportunity because
each select expression, e.g., "Quantity > 200", can be evaluated
independently of the others.  A less obvious example of both a
primary and secondary parallelism opportunity, and the one m...