Browse Prior Art Database

Regular and Fast Hardwired Interconnection for a Group of Execution Units to Calculate All Partial Results of an Associative Operation

IP.com Disclosure Number: IPCOM000062038D
Original Publication Date: 1986-Oct-01
Included in the Prior Art Database: 2005-Mar-09
Document File: 2 page(s) / 30K

Publishing Venue

IBM

Related People

Ching, WM: AUTHOR [+2]

Abstract

It is pointed out that it can be very important for a computer designed for general-purpose parallel processing to have a hardware support of the compound operation as a machine instruction for calculating all partial results of an associative function like addition and multiplication. This is because many frequently occurring data-dependent loops, which are very difficult for multiprocessors to handle efficiently in parallel, can be expressed in terms of this instruction, thus achieving optimal parallel efficiency and saving programmed branches otherwise needed.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Regular and Fast Hardwired Interconnection for a Group of Execution Units to Calculate All Partial Results of an Associative Operation

It is pointed out that it can be very important for a computer designed for general-purpose parallel processing to have a hardware support of the compound operation as a machine instruction for calculating all partial results of an associative function like addition and multiplication. This is because many frequently occurring data-dependent loops, which are very difficult for multiprocessors to handle efficiently in parallel, can be expressed in terms of this instruction, thus achieving optimal parallel efficiency and saving programmed branches otherwise needed. It is well known that if an n-element vector and n/2 number of execution units are utilized for a particular operation which is associative (for example, addition or multiplication), then a calculation of all partial results of this vector can be made with respect to the given operation in log n steps (for example, partial sums or partial products). In the preceding article, a hardware configuration and an algorithm to compute all partial results of an associative operation in parallel is described. It is assumed in the preceding article that for phase 2 there is a Uniform Shift Network (hence called USN) to send data of a fixed length vector to the desired ALUs (arithmetic and logic units). The details of the USN were not described in the preceding article. A regular hardwired interconnection scheme for a group of arithmetic functional units to perform this calculation without the USN is described here. Each ALU is replaced by one type of functional unit, adders or multiplie...