Browse Prior Art Database

Iterative (Systolic) Arrays Disclosure Number: IPCOM000044930D
Original Publication Date: 1983-Jan-01
Included in the Prior Art Database: 2005-Feb-06
Document File: 3 page(s) / 20K

Publishing Venue


Related People

Danielsson, P: AUTHOR


This invention relates to convolution operations and more particularly to an array configuration of cells.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 3

Iterative (Systolic) Arrays

This invention relates to convolution operations and more particularly to an array configuration of cells.

The recent concept of systolic arrays has acquired considerable attention. Systolic arrays are a subset of iterative arrays (=cellular automata). Usually, systolic arrays are limited to networks without feedback loops.

The word systolic implies a heart beat and pulsed flow. Translated to the digital design domain it means a clocked and pipelined system, the final output result being computed (accumulated) in a number of processor stages. In addition, the input data and control parameters are also allowed to move from input to different stages. No buses are allowed, which means that the only global signals on the chip are ground, power and clock(s). In fact, in a systolic array different input and output data streams may very well flow in separate and/or opposite directions forming a linearly, orthogonally or hexagonally connected network. Still, no inter-cellular closed cause-event loops are allowed (according to the interpretation of the present author). But the moving of 1's and 0's in different directions makes up the impression of something similar to the cardiovascular system in a living being.

In this invention, iterative cells are used at the bit level.

Each cell is performing a full-adder function which is the basic accumulation step in the convolution operation. The number of cells in this array are approximately N x L x (K+ log L) where N=# bits in a data word X K=# bits in the coefficients A

L=# points in the convolution kernel.

The throughput rate is equal to the clock rate which can be extremely high since the logic depth of all cells is only two (or four, depending on actual design of the full adder). 40 MHz seems to be a rather conservative estimate. Then, in every 25 ns, the array delivers a new result Y/(i)/.

Let there exist an array for the case L=3, K=3, N=5. The basic index order is (n, 1, k). Such a basic cell may be implemented to have a static 1-bit storage for each of the array coefficient bits a(1k).

The actual loading of the coefficients can be simplified if the memory cells for the bits a(1k) are chained together into one long shift register. Now the task of the device is to compute

Y/(i)/=E/L/(e=1)A(e9-X(e)/(i)/ in a running window fashion over the one-dimensional string of samples X/(i)/. New X-values are continuously entering at the t...