Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Efficient Reduction Execution for Convergence Checking on Distributed Multi-Processors

IP.com Disclosure Number: IPCOM000116509D
Original Publication Date: 1995-Sep-01
Included in the Prior Art Database: 2005-Mar-30
Document File: 4 page(s) / 110K

Publishing Venue

IBM

Related People

Komatsu, H: AUTHOR [+2]

Abstract

Disclosed is a fast and efficient method of executing reduction operation for convergence checking on distributed multi-processor computing systems. In the programs with iterative algorithms for solving systems of equations, the reduction operation is usually performed for convergence calculation, and processors need to synchronize for the reduction for each iteration in the conventional way. Rather by transforming the program as described below, processors can proceed to execute the computation for the next iteration speculatively without waiting for the convergence result from the other processors. This contributes for the significant performance gain by overlapping the computation and the reduction communication.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Efficient Reduction Execution for Convergence Checking on Distributed
Multi-Processors

      Disclosed is a fast and efficient method of executing reduction
operation for convergence checking on distributed multi-processor
computing systems.  In the programs with iterative algorithms for
solving systems of equations, the reduction operation is usually
performed for convergence calculation, and processors need to
synchronize for the reduction for each iteration in the conventional
way.  Rather by transforming the program as described below,
processors can proceed to execute the computation for the next
iteration speculatively without waiting for the convergence result
from the other processors.  This contributes for the significant
performance gain by overlapping the computation and the reduction
communication.

      The program transformation consists of the communication
scheduling and the buffer mechanism as follows.  First of all,
communication scheduling for the reduction operation should be done
in such a way that, on the gather operation, target data are sent in
pipelined fashion along each dimension of the processors arrangement
on which the data is mapped, and on the scatter operation, the
sending processor sends the data in blocking mode while other
processors receive the data in non-blocking mode.  This communication
manner preserves program's parallel execution pattern and enables
processors to speculatively execute the next iteration.  The
non-blocking receiving call on the scatter operation need to be
controlled so that each processor does not issue multiple calls at
any point of time to avoid message mismatches and deadlock between
processors.

      Secondly, buffering mechanism is required to preserve the
computation result for speculative iterations.  This is required for
all the variables which is live upon the exit of the outermost loop
to guarantee the correctness of the program transformation.  The
buffer size is determined by the number of allowable speculative
iterations, however, it should be practically around 10 to 20 at
maximum. ...