Browse Prior Art Database

Hardware and an Algorithm for Performing Parallel Prefix Calculations

IP.com Disclosure Number: IPCOM000062037D
Original Publication Date: 1986-Oct-01
Included in the Prior Art Database: 2005-Mar-09
Document File: 3 page(s) / 77K

Publishing Venue

IBM

Related People

Ching, WM: AUTHOR [+2]

Abstract

Introduction The advances of VLSI (very large-scale integration) have intensified the search for applications that effectively use the computational power of regular hardware structures. The following is a description of a hardware structure and an algorithm for performing prefix calculations which consist of partial results on a vector with respect to an associative operator. The operators will be performed in an ALU (arithmetic and logic unit) so the range of possible operators is large. An example includes partial sums and partial products. The structure should make effective use of the hardware, and it should have simple interconnections which provide simple, but rather general, data manipulations.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 3

Hardware and an Algorithm for Performing Parallel Prefix Calculations

Introduction The advances of VLSI (very large-scale integration) have intensified the search for applications that effectively use the computational power of regular hardware structures. The following is a description of a hardware structure and an algorithm for performing prefix calculations which consist of partial results on a vector with respect to an associative operator. The operators will be performed in an ALU (arithmetic and logic unit) so the range of possible operators is large. An example includes partial sums and partial products. The structure should make effective use of the hardware, and it should have simple interconnections which provide simple, but rather general, data manipulations. The Hardware Structure Hardware to perform prefix calculations should be useful on a variety of multiprocessor structures including shared memory, homogeneous, and multiple functional unit structures. The engineering tradeoffs may be different for different classes of machine. In the following, a particular hardware embodiment will be presented and its essential features will be described. Fig. 1 shows the hardware configuration. The essential properties of the ALU are the ability to execute the operator, two registers called accumulator and B-register, and one input and one output port from each ALU to the interconnection network. The essential property of the interconnection network is that it be able to perform the simultaneous uniform shift of inputs to outputs. The exact nature of the network communication and the I/O mechanism will not be discussed. The processor in the assumed configuration has local memory. Alternatives to this local storage using I/O are possible. The Algorithm for Prefix Calculations One of the difficulties inherent in parallel computation is matching the variable-length data with fixed-length hardware. Prefix calculations can be performed by passing successive segments of the vector through the hardware. In the following, an algorithm which more effectively utilizes the hardware is described. However, the hardware could also be used to process successive segments. The algorithm consists of three phases. The first and last are essentially serial computations on segments of the vector which reduce it to and expand it from a length vector which can be processed by the hardware. The middle portion is a modification of the algorithm described by Kogge and Stone [*]. Fig. 2 shows the algorithm for a vector of length 16 on a structure with 4 processors. There are several things that should be observed about the algorithm. First, the first and third phases perform "ripple-type" calculations. Each processor must produce a result which it passes to one of its inputs for the next cycle. Phase two requires a log of...