Browse Prior Art Database

# Method for Hiding Communication Latency and Increasing Pipe-Line Parallelism

IP.com Disclosure Number: IPCOM000116475D
Original Publication Date: 1995-Sep-01
Included in the Prior Art Database: 2005-Mar-30
Document File: 4 page(s) / 104K

IBM

## Related People

Ishizaki, K: AUTHOR [+3]

## Abstract

Disclosed is a loop optimization technique, by which communication latency can be hidden and pipe-line parallelism can be increased on massively parallel computers. Basically, this optimization is based on loop optimization techniques and optimization target is nested loops which are required to communicate before execution.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Method for Hiding Communication Latency and Increasing Pipe-Line
Parallelism

Disclosed is a loop optimization technique, by which
communication latency can be hidden and  pipe-line parallelism can be
increased on massively parallel computers.  Basically, this
optimization is based on loop optimization techniques and
optimization target is nested loops which are required to communicate
before execution.

The optimization is performed in the following steps:
o  loop analysis to detect blocking indices
-  Reuse indices analysis (non-reuse vector generation)
-  Communication analysis (no-communication vector generation)
-  Re-communication analysis for each operand
(non-re-communication vector generation)
-  Re-communication analysis for the whole loop nest
o  Loop transformation
-  Loop interchange - outer-most interchange for
re-communication indices inner-most interchange
for pipe-line execution indices
-  Brocking factor decision
o  Loop analysis to detect blocking indices

At first re-usable indices of target loop nest is detected:
do I=1, 1000
do J=1, 1000
do K=1, 1000
A(I,J) = A(I,J) + B(I,K) * C(K,J)
end do
end do
end do
Example 1: Matrix multiplication

In this example 1, the following non-reuse vector can be
obtained for each operand on index space (I,J,K).
A(I,J) ---> non-reuse vector   (1,1,0)
B(I,K) ---> non-reuse vector   (1,0,1)
C(K,J) ---> non-reuse vector   (0,1,1)

If indices which have value 0 are changed, the same operand is
re-accessed.  In other word, if an index has the value 1 for all
operands, it is not occurred to reuse.  Therefore, these indices are
omitted with analysis targets.

Basically non-reused vector is decides by following 4 factors.
dimensions of index space
dimensions of array
index expression
range of each index
do I=1,100
do J=1,100
A(I,J) = B(I,J+1,50) + C(I)
Example 2:

In this example 2, array B is accessed by all index.
Therefore, no index has reusability.  Array C is accessed by index I
and reused by Index J.  In the next example, there 2 index variables
are used in one index expression.
do I=1,100
do J=1,100
do K=1,100
A(I,J,K) = B(I+J,K)
Example 3

In example 3,  re-use vector of B(I+J,K) is (0,0,1).
do I=1,100,10
do J=1,10
do K=1,100
A(I,J,K) = B(I+J,K)
Example 4:

In this example 4, if either Index I or Index J is changed,
array B is not reused.  Therefore, non-reuse vector of B(I+J,K) is
(1, 1, 1).

As the next step communication analysis for each array is
applied.  The communication is decided by data d...