Browse Prior Art Database

Sparse Matrix Vector Multiplication on Polymorphic-Torus

IP.com Disclosure Number: IPCOM000035778D
Original Publication Date: 1989-Aug-01
Included in the Prior Art Database: 2005-Jan-28
Document File: 7 page(s) / 128K

Publishing Venue

IBM

Related People

Li, H: AUTHOR [+2]

Abstract

Disclosed is a procedure which offers a speed improvement of several orders of magnitude over existing methods in performing the multiplication of a sparse matrix to a vector.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 36% of the total text.

Page 1 of 7

Sparse Matrix Vector Multiplication on Polymorphic-Torus

Disclosed is a procedure which offers a speed improvement of several orders of magnitude over existing methods in performing the multiplication of a sparse matrix to a vector.

The sparse matrix vector multiplication is performed on a massively parallel SIMD architecture called Polymorphic-torus which consists of nxn processors connected by a mesh topology physically and whose mesh network is overlayed by another switching network for reconfiguration purposes. At each node of the mesh lies a 4-port switch that can electrically short-circuit any two, three or four of its ports. A full description of the Polymorphic-torus is referred to in [*].

The sparse matrix vector multiplication algorithm on the Polymorphic-torus consists of two stages. The first stage of the algorithm, the condensation process, is an algorithm that converts the sparse connectivity into a more uniform data structure suitable for the topology of the Polymorphic-torus. More specifically, many matrix rows are packed into a processor row such that there are fewer idle processors and the utilization of the processors can be increased.

The second stage performs the condensed matrix vector multiplication. This second stage requires the Polymorphic-torus to reconfigure itself to match the condensed data structure resulting from the first stage. More specifically, a broadcast bus and n simultaneous sum-trees are configured to achieve a unit- time multiplication and a logarithmic- time summation, an optimal result.

(Image Omitted)

STAGE 1: STRUCTURED CONDENSATION Sparse Matrix Representation

A sparse matrix is represented by three arrays: (1) an Element array, E, which holds the values of the nonzero entries of the matrix; (2) a Row Pointer array, RP, and (3) a Column Index array, CI; the latter two arrays store the connectivity of the sparse matrix.

(Image Omitted)

The construction of the three arrays can be illustrated in Fig. 1. The element array E is built by scanning in a row-major order but only the nonzero entries are recorded. For a matrix with n rows, the RP array contains n + 1 items in which the i-th item points to the position of the first nonzero entry of the i-th row in the E array. The n + 1-th item which is one greater than the number of nonzero entries in the matrix is regarded as a delimiter. For example, the third item in RP (i.e., 4) points to the 4th item in array E (i.e., 4.0) which is the first nonzero entry of the third row. The CI array has as many items as the E array and each item contains the column position of the corresponding element in the array E. For example, the column position of 4.0 is 3.

Condensation Algorithm: The purpose of the Condensation

1

Page 2 of 7

algorithm is to convert the representation of the sparse matrix to a format better suited for the Polymorphic-torus. In specific, the goals are:
(1) to assign only the nonzero entries of the matrix to

processors so that the co...