Browse Prior Art Database

The Torus Routing Chip

IP.com Disclosure Number: IPCOM000127947D
Original Publication Date: 1986-Dec-31
Included in the Prior Art Database: 2005-Sep-14
Document File: 11 page(s) / 37K

Publishing Venue

Software Patent Institute

Related People

William J. Dally: AUTHOR [+4]

Abstract

The torus routing chip (TRC) is a self-timed chip that performs deadlock-free cut-through routing in k-ary n-cube multiprocessor interconnection networks using a new method of deadlock avoidance called virtual channels. A prototype TRC with byte wide self-timed communication channels achieved on first silicon a throughput of 64Mbits/s in each dimen-sion, about an order of magnitude better performance than the communication networks used by machines such as the Caltech Cosmic Cube or Intel iPSC. The latency of the cut-through routing of only 150ns per routing step largely eliminates message locality con-siderations in the concurrent programs for such machines. The design and testing of the TRC as a self-timed chip was no more difficult than it would have been for a synchronous chip.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 11% of the total text.

Page 1 of 11

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

The Torus Routing Chip

William J. Dally Charles L. Seitz

Computer Science Department California Institute of Technology

5208: TR:86

January 24. 1986

To be published in Journal of Distributed Competing vol. 1, no. 3, 1986.

Abstract

The torus routing chip (TRC) is a self-timed chip that performs deadlock-free cut-through routing in k-ary n-cube multiprocessor interconnection networks using a new method of deadlock avoidance called virtual channels. A prototype TRC with byte wide self-timed communication channels achieved on first silicon a throughput of 64Mbits/s in each dimen-sion, about an order of magnitude better performance than the communication networks used by machines such as the Caltech Cosmic Cube or Intel iPSC. The latency of the cut-through routing of only 150ns per routing step largely eliminates message locality con-siderations in the concurrent programs for such machines. The design and testing of the TRC as a self-timed chip was no more difficult than it would have been for a synchronous chip.

The research described in this paper was sponsored in part by the Defense Advanced Research Projects Agency, ARPA Order number 3771, and monitored by the Office of Naval Research under contract number N00014-79-C-0597, in part by Intel Corporation, and in part by an AT&T Ph.D. fellowship.

California Institute of Technology, 1.986.

1 Introduction

Message-passing concurrent computers such as the Caltech Cosmic Cube (13j and Intel iPSC (61 consist of many processing nodes that interact by sending messages over communication channels between the nodes. We designed the torus routing chip (TRC) as a building block to construct high-throughput, low-latency k-cry re-cube interconnection networks for message- passing concurrent computers.

The TRC is a self-timed VLSI circuit that provides deadlock-free packet communications in k-cry n-cube (torus) networks [12] with up to k = 256 processors in each dimension. While intended primarily for n = 2-dimensional networks, the chips can be cascaded to handle n-dimensional networks using rill TRC chips at each processing node. A prototype TRC has been laid out, fabricated, and tested.

California Institute of Technology Page 1 Dec 31, 1986

Page 2 of 11

The Torus Routing Chip

Even if only two dimensions are used, the TRC can be used to construct concurrent comput-ers with up to 216 nodes. It would be very difficult to distribute a global clock over an array of this size [4]. To avoid this problem, the TRC is entirely self-timed [11], thus permitting each processing node to operate at its own rate with no need for global synchronization. Synchronization, when required, is performed by arbiters in the TRC.

To reduce the latency of communications that traverse more than one channel, the TRC uses cut-through ('T) routing rather than store-aced-forward routing. Instead of reading an entire packet into a processing node before starting tr...