Browse Prior Art Database

Processing Local and Remote Data in a Parallel Environment

IP.com Disclosure Number: IPCOM000118288D
Original Publication Date: 1996-Dec-01
Included in the Prior Art Database: 2005-Apr-01
Document File: 2 page(s) / 82K

Publishing Venue

IBM

Related People

Bird, CL: AUTHOR [+2]

Abstract

Disclosed is a method for ensuring that data items (for example, records being analyzed by a data mining technique) are processed by a parallel processing application in a consistent manner irrespective of whether they are local to the processing node or have been transmitted as messages from remote nodes in the parallel environment.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Processing Local and Remote Data in a Parallel Environment

      Disclosed is a method for ensuring that data items (for
example, records being analyzed by a data mining technique) are
processed by a parallel processing application in a consistent manner
irrespective of whether they are local to the processing node or have
been transmitted as messages from remote nodes in the parallel
environment.

      There are two models for parallel processing systems which
have data shared across a number of nodes, also known as tasks:
  1.  Processing is under the control of a master node (task),
       which may have its own share of the data.  The other nodes
       (tasks) are described as slaves.  With this model, there
       will almost always be only the one master node.
  2.  No one node is in control; all nodes are communicating
       with each other.  This is referred to as the "any-to-any"
       model.

      In both models, an individual node can be collating information
from other nodes and incorporating the corresponding information
derived from its own local data.  Since the remote information will
be delivered via a message passing interface, the local data requires
special-case treatment, with the attendant risk of the two code
branches becoming out of step and inconsistent (causing difficulties
with reliability and maintainability).

      The solution described operates in the context of the
"master-slave" model, but is conceptually identical for the
"any-to-any" model.

      In the "master-slave" model, the remote nodes (slaves),
operating in parallel, process one or more items of data and each
assembles the result into a block of information suitable for
transmission to the controlling (master) node.  This assembly process
typically involves packing, to minimize the message length for
efficient transmission by the message passing interface.  The master
node will then gather and collate the information from its slaves,
carrying out such further processing as is required by the
application.  Before it can process the remote information, it must
first unpack it from the message.

      Where the master node has its own local data, it processes this
while the slaves are doing theirs.  It is highly desirable that the
resulting informatio...