Browse Prior Art Database

Reduced Cost Multistage Combining Network

IP.com Disclosure Number: IPCOM000037331D
Original Publication Date: 1989-Dec-01
Included in the Prior Art Database: 2005-Jan-29
Document File: 4 page(s) / 28K

Publishing Venue

IBM

Related People

Heidelberger, P: AUTHOR

Abstract

A multistage interconnection network that provides combining of synchronization operations such as Fetch-and-Add instructions [1] is described. Since each switching element in the network need not provide the combining capability, the hardware complexity to implement the network is less than that required by a full combining network.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 34% of the total text.

Page 1 of 4

Reduced Cost Multistage Combining Network

A multistage interconnection network that provides combining of synchronization operations such as Fetch-and-Add instructions [1] is described. Since each switching element in the network need not provide the combining capability, the hardware complexity to implement the network is less than that required by a full combining network.

Gottlieb, et al. (1) proposes a multistage interconnection network, called a combining network, for the NYU Ultracomputer that contains logic within the network switches to combine multiple simultaneous synchronization requests into a single synchronization request. The advantage of combining is that it reduces serialization delays at the eventual destination (memory module) and potentially reduces "hot spot" traffic (2). A combining network was also proposed for the RP3 machine (3). The disadvantage of a combining network is that it requires a great deal of hardware complexity to implement, e.g., each switch in the network must be capable of doing integer arithmetic. In addition, because of the increased hardware complexity, the delay through the combining network may be relatively large due to a large number of chip crossings. The use of software combining trees to avoid the potential hot spot problem is described in (4); however, this technique also has limited performance due to potentially high software overheads.

The invention is a buffered Omega-style multistage network that interconnects processors to (shared) memories as shown in the figure. For simplicity we assume that each switch is a 2 by 2 switch, i.e., has two input ports and two output ports; the invention generalizes to k by k switches, but at increased complexity. We further describe the in vention in terms of how it handles the Fetch-and-Add synchronization primitive; the network may or may not support other memory traffic, such as loads and stores, and the generalization to more general combinable Fetch-and-d operations is immediate. We assume that the packets P that flow through the network contain the following information: P = (O,S,D,I,F) where O is the operation code (e.g., Fetch- and-Add), S is the source address of the packet, D is the destination address of the packet, I is the operand (or result) of the operation (e.g., the increment in a Fetch-and-Add) and F = (F(0),F(1),F(2) is a three-bit collision detection flag (initially F = (0,0,0)). There are two types of switching elements in the network: Non-combining switches (N-switch) and Combining switches (C-switch). Each switch in the network can be of either type. The N-switches provide the following functions:

1. Packet routing and queueing, i.e., the normal functions of a switch in a buffered multistage network.

2. Detection and flagging of combinable requests. This is implemented by a comparator within the switch and is only applicable for packets flowing from the processors to the memories. If two packets, P1 and P2, simultaneou...