Browse Prior Art Database

Low-Cost DEVICE for Contention-Free BARRIER Synchronization

IP.com Disclosure Number: IPCOM000034819D
Original Publication Date: 1989-Apr-01
Included in the Prior Art Database: 2005-Jan-27
Document File: 7 page(s) / 111K

Publishing Venue

IBM

Related People

Heidelberger, P: AUTHOR [+3]

Abstract

A device is described that provides a low-cost means for synchronizing multiple processors at a barrier in a program. All processors must reach such a barrier before any process can pass the barrier. For barrier synchronization, each process has code that is to be executed before and after the barrier. Processes can run in parallel provided that no process executes post-barrier code until all processes complete the prebarrier code. In general, the processes execute in parallel on prebarrier code. As each process reaches the barrier, it halts. Only after all processes reach the barrier can computation continue, at which point the processes run post-barrier code in parallel. For a more detailed description of barriers, see [*]. A typical example in which a barrier is useful is illustrated below.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 29% of the total text.

Page 1 of 7

Low-Cost DEVICE for Contention-Free BARRIER Synchronization

A device is described that provides a low-cost means for synchronizing multiple processors at a barrier in a program. All processors must reach such a barrier before any process can pass the barrier. For barrier synchronization, each process has code that is to be executed before and after the barrier. Processes can run in parallel provided that no process executes post-barrier code until all processes complete the prebarrier code. In general, the processes execute in parallel on prebarrier code. As each process reaches the barrier, it halts. Only after all processes reach the barrier can computation continue, at which point the processes run post-barrier code in parallel. For a more detailed description of barriers, see [*]. A typical example in which a barrier is useful is illustrated below.

(Image Omitted)

for i: = 1 to

N do par

begin

< Loop A: All iterations of this loop can be

done

concurrently >

end

barrier

for i: = 1 to N do par

begin

< Loop B: All iterations of this loop can be

done

concurrently >

end The prebarrier code is Loop A and the post-barrier is Loop B. Iterations of Loop A can be executed in parallel in an arbitrary order, and similarly for the iterations of Loop B. But no iteration of Loop B can be computed until all iterations of Loop A are completed. THE BARRIER SYNCHRONIZER The device is depicted in Fig. 1. The device is a module that is connected to each of M physical processors. The module contains within it

(Image Omitted)

1. a state register of M bits, each of which shows

the state of one processor indicating whether or not

that processor has reached a barrier, 2. a mask register of M bits, each of which indicates whether or not the corresponding state bit is

participating in a barrier, 3. an output that is the logical AND of all state bits whose corresponding mask bits are set to logical 1, 4. an output that is the logical OR of all state bits whose corresponding mask bits are set to logical 1, 5. M pins,

1

Page 2 of 7

each of which is directly connected to one

processor of a set of M processors, 6. a set of control inputs that permit the device to be initialized and then function as a barrier

synchronizer.

The barrier synchronization is done as follows:

(Image Omitted)

1. A controlling process

places the device in initialization mode by means of

signal on the control pins. In this mode the device

solicits information from all processors to determine

which processors are participating in a barrier. In

this mode all mask bits are set to 0, and the state

bits are set to 0. The presence of the initialization

mode is indicated at the output by generating a logic 1

on the AND output and a logic 0 on the OR output, which

is otherwise an impossible output state to achieve. 2. Each of the M processors reads the OR and AND outputs. When the initialization state is sensed, each processor

responds by transmitting a sequence of bits to its

respective input...