Browse Prior Art Database

Device That Provides for Contention-Free Barrier Synchronization in a Multiprocessor

IP.com Disclosure Number: IPCOM000035180D
Original Publication Date: 1989-Jun-01
Included in the Prior Art Database: 2005-Jan-28
Document File: 3 page(s) / 35K

Publishing Venue

IBM

Related People

Franaszek, PA: AUTHOR [+3]

Abstract

An important type of synchronization in a multiprocessor is a barrier synchronization. The idea of the barrier synchronization is that prior to the barrier, all processes run in parallel and asynchronously. However, no process can pass the barrier until all have reached it. Barriers of this form are well known in the literature [1].

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

Device That Provides for Contention-Free Barrier Synchronization in a Multiprocessor

An important type of synchronization in a multiprocessor is a barrier synchronization. The idea of the barrier synchronization is that prior to the barrier, all processes run in parallel and asynchronously. However, no process can pass the barrier until all have reached it. Barriers of this form are well known in the literature [1].

A typical example in which a barrier is useful is illustrated below. for i: = 1 to N do par begin < Loop A: All iterations of this loop can be done concurrently > end barrier for i: = 1 to N do par begin < Loop B: All iterations of this loop can be done concurrently > end

In the example all iterations of Loop A are done in parallel, but all are completed before any iteration of Loop B is started. All iterations of Loop B can also be done concurrently.

The reason for the barrier is that some iteration of Loop A may write a value that is used by a different iteration, possibly an earlier iteration of Loop B. This is the case, for example, in Fast Fourier Transform algorithms in which each of log N phases of the algorithm are separated by barriers because the algorithm cannot run correctly if different phases are run concurrently.

Because each iteration is potentially very short, it is necessary to implement the barrier synchronization with high efficiency. If the barrier synchronization involves the use of shared memory, then the shared memory at the synchronizer may require a number of accesses proportional to the number of iterations, whereas the time per iteration can be a small constant, independent of N. In this case, the barrier itself becomes a bottleneck, and it determines the maximum rate of computation instead of the computation rate being limited by the iteration in which the useful work is done.

A hardware device is described in [2] that provides a restricted type of barrier synchronization in which all processors must halt at a barrier, and all processors are released when the last one reaches the barrier. This article describes a generalization of that device that provides greater flexibility than the device described in 2 Specifically, 1. a subset of processors rather than the full collection of processors can participate at a barrier, and 2. a processor can participate at a barrier multiple times rather than a single time.

The invention is depicted in the figure. The device is a module that is connected to each of M physical processors. The module contains within it 1. a limit register that gives the number of processes that must reach a barrier, 2. a sum register that holds...