Browse Prior Art Database

Omega-Crossbar Network

IP.com Disclosure Number: IPCOM000043892D
Original Publication Date: 1984-Oct-01
Included in the Prior Art Database: 2005-Feb-05
Document File: 4 page(s) / 59K

Publishing Venue

IBM

Related People

Pfister, GF: AUTHOR

Abstract

This article describes an Omega/Crossbar network which greatly improves the efficiency of highly parallel multiprocessor systems of the type described in [*] where many (for example, 4096) processors are connected to a common main memory using an Omega network. The efficiency improvement results from substantially decreasing the average "memory latency" -- the time it takes for a processor's memory request to be satisfied -- without substantially increasing the serialization of synchronization primitives. The decrease in average latency is obtained at the expense of an increase in worst-case latency; it is argued below that the worst-case situation will occur with negligible frequency.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 29% of the total text.

Page 1 of 4

Omega-Crossbar Network

This article describes an Omega/Crossbar network which greatly improves the efficiency of highly parallel multiprocessor systems of the type described in [*] where many (for example, 4096) processors are connected to a common main memory using an Omega network. The efficiency improvement results from substantially decreasing the average "memory latency" -- the time it takes for a processor's memory request to be satisfied -- without substantially increasing the serialization of synchronization primitives. The decrease in average latency is obtained at the expense of an increase in worst-case latency; it is argued below that the worst-case situation will occur with negligible frequency. The memory latency decrease is obtained by replacing a number of stages of the Omega network at the "processor side" of crossbar switches with fast control logic. The standard Omega network has a memory latency of 2 log(P), P being the number of processors, for every memory request. An Omega/Crossbar network will produce average latency: 2*(log(p) - K + 1) worst-case latency: 2*(log(p) - K + 1 + 2**K) where K can be chosen in the range 2<=K<log(p). For example, for 4096 processors the standard Omega network has a latency of 24. An Omega/ Crossbar for 4096 processors with K=6 will have an average latency of 10, and a worst-case latency of 142. The name derives from the combination of partial Omega network and crossbar switches. Mixed-topology networks, of which this is an example, are common in telephony. However, they have never been used in the ultra-high bandwidth application of computer processor-to-memory communication. Furthermore, this particular combination -- Crossbar and Omega network -- is new. Background: The "Ultra Computer" Architecture The "Ultra Computer" architecture to which the present system most directly applies will now be briefly described. For a more detailed description, see [*]. Fig. 1 shows a small (16-processor) multiprocessor system of the "Ultra" type, connecting processors (left) to main memory modules (right) through an Omega network. A crucial feature of the Ultra application of Omega networks is that memory requests and data are latched at each Omega switch node. This substantially increases memory latency, but also allows combining memory requests, particularly the "fetch-and-add" (FA) processor synchronization primitive. For example, if simultaneously PO requests FA(locl,A) and P1 requests FA(locl,B), node 1 can combine these into a single request: FA(locl,A+B). If at the same time P2 and P3 request FA(locl,C) and FA(locl,D) respectively, a combination of the combined requests to FA(locl,A+B+C+D) occurs at either node 9 or node 10, depending on the value of locl; etc. (Each node must record the combinations it has made in its own local memory, so that when a reply to a combined request is received, the reply can be split into two separate replies, one for each requestor. See [*] for details.)...