Browse Prior Art Database

An Implementation of a Combining Network for the NYU Ultracomputer

IP.com Disclosure Number: IPCOM000128240D
Original Publication Date: 1986-Dec-31
Included in the Prior Art Database: 2005-Sep-15
Document File: 10 page(s) / 37K

Publishing Venue

Software Patent Institute

Related People

Susan Dickey: AUTHOR [+5]

Abstract

The NYU Ultracomputer architecture, a shared memory MOND design composed of thousands of processing elements, requires a high-performance interconnection network between its processors and memory modules. This network must be capable of merging concurrent requests to the same memory location from multiple processors. We present an implementation of such a network for a 32-processor configuration with emphasis on the mechanisms used to merge these concurrent requests.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 10% of the total text.

Page 1 of 10

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

An Implementation of a Combining Network for the NYU Ultracomputer

Susan Dickey, Richard Kenner, Marc Snirl

Ultracomputer Research Laboratory Courant Institute of Mathematical Sciences New York University 251 Mercer Street New York, NY 10012

Ultracomputer Note #93

January, 1986

ABSTRACT

The NYU Ultracomputer architecture, a shared memory MOND design composed of thousands of processing elements, requires a high-performance interconnection network between its processors and memory modules. This network must be capable of merging concurrent requests to the same memory location from multiple processors. We present an implementation of such a network for a 32-processor configuration with emphasis on the mechanisms used to merge these concurrent requests.

I. Introduction

The NYU Ultraoomputer architecture (described in detail in Gottlieb et al. [83], Edler et al. [85J, and the references therein) is a shared memory NInV1D design capable of supporting thousands of processing elements. Each processing element is augmented by a fetch-and-add (F&A) operation. The function F&A(V, e) indivisibly increments the (shared) variable V by the value of the expression a and returns the old value of the memory location specified by v . Concurrent F&A operations are required to have the same effect as if executed in some (unspecified) serial order. This F&A operation is a powerful interprocessor communication mechanism which can be used to eliminate most critical sections and thus to permit highly concurrent execution of algorithms, such as queue insertion and deletion, that are important in operating systems and parallel applications (Gottlieb and Kruskal [81]). To approximate the performance of an idealized parallel processor (dubbed a "paracomputer" by Schwartz [80] and a WRAM by Borodin and Hopcroft[81]) where all processors can simulataneously access memory in a single cycle, a high-

iThis work was supported in part by the Army Research Office under grant number 20391-MA- F, and in part by the National Science Foundation under grant number DCR-8413359.

N~1 u ~ performance network is required to connect the processing elements (PE's) with the memory modules (MM's) comprising the shared memory. This message switching network, which has the topology of Lawrie's [75] fl-network, is described in Dickey et al. [85] and has the following properties: The network contains N log N identical nodes where N is the number of PE's. Network bandwidth is linear in N. Latency, i.e., memory access time, is logarithmic in N. Messages are divided into packets which are transmitted in successive cycles. If no routing

New York University Page 1 Dec 31, 1986

Page 2 of 10

An Implementation of a Combining Network for the NYU Ultracomputer

conflicts exist, at every cycle each switching node can receive an input packet at each input port and transmit a packet at each output port. Switch settings are maintained...