Browse Prior Art Database

Processing Data Distributed Across a Parallel System

IP.com Disclosure Number: IPCOM000118322D
Original Publication Date: 1996-Dec-01
Included in the Prior Art Database: 2005-Apr-01
Document File: 2 page(s) / 66K

Publishing Venue

IBM

Related People

Bird, CL: AUTHOR [+3]

Abstract

In managing data gathered from the nodes of a parallel processing system, it is important that the correct order of processing be maintained. A deterministic order of processing is particularly relevant when data records are being analyzed by a data mining technique.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Processing Data Distributed Across a Parallel System

      In managing data gathered from the nodes of a parallel
processing system, it is important that the correct order of
processing be maintained.  A deterministic order of processing is
particularly relevant when data records are being analyzed by a data
mining technique.

      One advantage of a parallel processing environment is the
ability to share data across a number of nodes, which may also be
described as tasks.  Conceptually, there are two models for
processing that data:
  1.  Processing is under the control of a master node, with the
       other nodes being described as slaves.  With this model,
       there will almost always be only the one master node.  It
       may have its own share of the data.
  2.  No one node is in control; all nodes can communicate with
       each other.  This can be described as the "any-to-any" model.

      The improved method described for managing data applies
principally to the first model: the master node is gathering
information from the slave nodes and, if it has its own local data,
from itself too.  The information is to be processed in an order
which is deterministic, but not necessarily the sequence in which the
items of information become available.  The items must be managed in
such a way  that the correct order of processing can be maintained.
This is particularly relevant for a data mining application, since
the manner in which the model of understanding evolves will be
determined by the order in which data items (records) are analyzed.
When mining data in a ...