Browse Prior Art Database

Parallel Synchronization With Hardware Collision Detection and Software Combining

IP.com Disclosure Number: IPCOM000036177D
Original Publication Date: 1989-Sep-01
Included in the Prior Art Database: 2005-Jan-28
Document File: 3 page(s) / 25K

Publishing Venue

IBM

Related People

Franaszek, PA: AUTHOR [+3]

Abstract

A technique for synchronizing parallel processors is described. Standard interconnection network hardware is used to detect collisions and, in the event of collisions, software is used to combine multiple requests, thereby avoiding problems associated with memory "hot-spots" [1]. The method therefore has a cost advantage over systems with specialized hardware combining networks 2,3 and a performance advantage over purely software combining techniques 4.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

Parallel Synchronization With Hardware Collision Detection and Software Combining

A technique for synchronizing parallel processors is described. Standard interconnection network hardware is used to detect collisions and, in the event of collisions, software is used to combine multiple requests, thereby avoiding problems associated with memory "hot-spots" [1]. The method therefore has a cost advantage over systems with specialized hardware combining networks 2,3 and a performance advantage over purely software combining techniques 4.

The invention to be described is a technique for synchronizing parallel processors on a shared memory machine. We will describe the invention in terms of the Fetch-and-Add synchronization instruction 2; the generalization to more general combinable Fetch-and-d instructions is immediate. We assume that multiple processors are connected to multiple memory modules via an interconnection network, as shown in the figure. A subset (possibly all) of the processors is designated as combiners whose job it will be to collect, combine and decombine Fetch- and-Adds (combiners may optionally perform other tasks as well). We assume that the hardware interconnection network has a collision- detection mechanism as would be the case in either an unbuffered delta or crossbar network. The network may furthermore consist of a hierarchy of paths as described in 5. In the case of a path hierarchy, only one level of the hierarchy (say, level 1) needs to provide collision detection.

When a Fetch-and-Add packet is sent over the network either:

1. The packet gets through successfully to the memory

module and

is positively acknowledged (by performing the

requested memory

operation), or

2. The packet does not get through the network and a

negative

acknowledgement (nak) is received.

Notice that the nak may be caused by a number of factors including the path being busy or the memory module being busy (or its buffer being full). If a packet collides with another packet, the two packets may either be combinable or not. When a nak is received, the processor may optionally retry the packet on the interconnection network or it may initiate combining as described below:

We assume that associated with each shared variable is a subset of the combiners. The rule for assigning the subsets is arbitrary; for example, it could be determined by hashing. When combining is initiated, the processor selects (according to an arbitrary rule) one of the combiners from the set of combiners associated with the variable. The processor then sends the Fetch-and-Add to the selected combiner. When a combiner receives the Fetch-and-Add, it places it in its internal memory and determines if it may be combined with any other Fetch-and- Add currently in its memory. If a match is found, the appropriate data structures are set up...