Browse Prior Art Database

Design of Systolic Algorithms for Large Scale Multiprocessors

IP.com Disclosure Number: IPCOM000128723D
Original Publication Date: 1987-Dec-31
Included in the Prior Art Database: 2005-Sep-16
Document File: 16 page(s) / 46K

Publishing Venue

Software Patent Institute

Related People

Jingke Li: AUTHOR [+5]

Abstract

In this paper, a systematic method for generating systolic algo-rithms for large scale multiprocessors is described. The method applies to a subclass of programs called quasi-uniform recurrent equations, which is more general than uniform recurrent equations appearing in the literature. With a given optimality principle, our method will generate optimal algorithms for any user specified network topologies. A resulting algorithm not only tells when and where a computation should be performed, but also how each message should be routed in the network. Such algorithms can then be implemented on a target machine. The mapping problem is described by a matrix equation, and necessary and sufficient conditions for the existence of a linear mapping are given.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 9% of the total text.

Page 1 of 16

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

Design of Systolic Algorithms for Large Scale Multiprocessors

Jingke Li Marina C. Chen Mark F. Young

April 1987

Abstract

In this paper, a systematic method for generating systolic algo-rithms for large scale multiprocessors is described. The method applies to a subclass of programs called quasi- uniform recurrent equations, which is more general than uniform recurrent equations appearing in the literature. With a given optimality principle, our method will generate optimal algorithms for any user specified network topologies. A resulting algorithm not only tells when and where a computation should be performed, but also how each message should be routed in the network. Such algorithms can then be implemented on a target machine. The mapping problem is described by a matrix equation, and necessary and sufficient conditions for the existence of a linear mapping are given.

1 Introduction

Parallel processing can substantially speed up computations for many ap- plications. The appearance of the first generation of message-passing mul- tiprocessors in recent years, the experimental Cosmic Cube [13], and com- mercially, the FPS T series, the Intel iPSC, the Connection Machine, shows that it is not only desirable but also practical to build parallel machines. Considering th ' e speedy development of high technology, it is safe to predict that a new generation of multiprocessors will come soon. One promising new architecture has already been proposed by Dally and Seitz [4,5). The proposed machine consists of a massive number of

t Department of Computer Science, Yale University, New Haven, CT06511. Floating Point Systems, Inc., P.O.Box 23489, Portland, OR97223. processing nodes interconnected into a mesh-like torus network. Contrasted with the present machines, these machines are fine- grained and have large diameter and low connectivity. Furthermore, the communication times of these machines are improving. As claimed in [4], the local node-to-node communication times can "approach main memory access times of sequential computers." The processing nodes of these new fine-grained systems are smaller and cannot support all the operating system features of conventional microprocessor systems. Low-level programming of such systems can be more difficult and is ideally left to a high-level language compiler. One of the significant challenges for a compiler is the problem of mapping an algorithm across a greater number of nodes while minimizing the performance impact of the necessarily increased communication. There is a large class of important application algorithms, like numeri-cal solutions to partial differential equations, matrix operations, signal pro-cessing, dynamic programming and some graph algorithms, which have the property the data dependencies embedded in them are very regular and are mainly local. These algorithms have been shown to be suitable for systolic processing....