Browse Prior Art Database

Method for Hiding Communication Latency and Increasing Pipe-Line Parallelism

IP.com Disclosure Number: IPCOM000116475D
Original Publication Date: 1995-Sep-01
Included in the Prior Art Database: 2005-Mar-30
Document File: 4 page(s) / 104K

Publishing Venue

IBM

Related People

Ishizaki, K: AUTHOR [+3]

Abstract

Disclosed is a loop optimization technique, by which communication latency can be hidden and pipe-line parallelism can be increased on massively parallel computers. Basically, this optimization is based on loop optimization techniques and optimization target is nested loops which are required to communicate before execution.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Method for Hiding Communication Latency and Increasing Pipe-Line
Parallelism

      Disclosed is a loop optimization technique, by which
communication latency can be hidden and  pipe-line parallelism can be
increased on massively parallel computers.  Basically, this
optimization is based on loop optimization techniques and
optimization target is nested loops which are required to communicate
before execution.

      The optimization is performed in the following steps:
  o  loop analysis to detect blocking indices
     -  Reuse indices analysis (non-reuse vector generation)
     -  Communication analysis (no-communication vector generation)
     -  Re-communication analysis for each operand
         (non-re-communication vector generation)
     -  Re-communication analysis for the whole loop nest
  o  Loop transformation
     -  Loop interchange - outer-most interchange for
         re-communication indices inner-most interchange
         for pipe-line execution indices
     -  Brocking factor decision
  o  Loop analysis to detect blocking indices

      At first re-usable indices of target loop nest is detected:
  do I=1, 1000
    do J=1, 1000
      do K=1, 1000
        A(I,J) = A(I,J) + B(I,K) * C(K,J)
      end do
    end do
  end do
      Example 1: Matrix multiplication

      In this example 1, the following non-reuse vector can be
obtained for each operand on index space (I,J,K).
      A(I,J) ---> non-reuse vector   (1,1,0)
      B(I,K) ---> non-reuse vector   (1,0,1)
      C(K,J) ---> non-reuse vector   (0,1,1)

      If indices which have value 0 are changed, the same operand is
re-accessed.  In other word, if an index has the value 1 for all
operands, it is not occurred to reuse.  Therefore, these indices are
omitted with analysis targets.

Basically non-reused vector is decides by following 4 factors.
    dimensions of index space
    dimensions of array
    index expression
    range of each index
  do I=1,100
    do J=1,100
      A(I,J) = B(I,J+1,50) + C(I)
        Example 2:

      In this example 2, array B is accessed by all index.
Therefore, no index has reusability.  Array C is accessed by index I
and reused by Index J.  In the next example, there 2 index variables
are used in one index expression.
  do I=1,100
    do J=1,100
      do K=1,100
        A(I,J,K) = B(I+J,K)
          Example 3

      In example 3,  re-use vector of B(I+J,K) is (0,0,1).
  do I=1,100,10
    do J=1,10
      do K=1,100
        A(I,J,K) = B(I+J,K)
          Example 4:

      In this example 4, if either Index I or Index J is changed,
array B is not reused.  Therefore, non-reuse vector of B(I+J,K) is
(1, 1, 1).

      As the next step communication analysis for each array is
applied.  The communication is decided by data d...