Browse Prior Art Database

Inverse Shuffle Transpose

IP.com Disclosure Number: IPCOM000049251D
Original Publication Date: 1982-Apr-01
Included in the Prior Art Database: 2005-Feb-09
Document File: 3 page(s) / 48K

Publishing Venue

IBM

Related People

Karp, AH: AUTHOR

Abstract

Transposing large matrices with sequential access to data only results in better I/O and cache performance than standard methods. Several applications require the transpose of matrices too large to be held in memory. The proposed algorithm performs this transpose by accessing the data sequentially which provides for better I/O and cache performance.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 55% of the total text.

Page 1 of 3

Inverse Shuffle Transpose

Transposing large matrices with sequential access to data only results in better I/O and cache performance than standard methods. Several applications require the transpose of matrices too large to be held in memory. The proposed algorithm performs this transpose by accessing the data sequentially which provides for better I/O and cache performance.

Consider the matrix A stored in column major order. The location of element r in column c is L(A(rc) ) = L(A(oo) ) + r + Nc where 0 < r < N-1, 0 < c < M-1 and N is the number of rows. M is the number of columns. If N ' 2/n/, then the transpose is performed as follows.
1. Make a sequential pass through the matrix storing all the

even numbered elements followed by all the odd numbered

elements. (Memory management is discussed below.)
2. Repeat step 1 n times.

The matrix is now stored in row major order.

Proof: Let 0 less than equal to r less than equal to 2 and 0 less than equal to c 2/m/. Then Ar,c) = x(0) (r + c . 2(n), where x is the sequential representation of A. An inverse shuffle step is given by (see original).

Memory (either core or tape/disk) can be managed effectively. If the array is stored in two halves of length 2(m+n-1), the memory management requires room for 3 . 2(m+n-1) elements' as illustrated in the figure, which is either a map of main memory or tape/disk files.

The computational effort is easy to calculate, n passes are made through the matrix reading j (where j is an even number) of elements sequentially, processing them, reading j more, etc., until the entire matrix has been read. A load and a store is required for each element on each pass. Thus, n . 2(n+m) /j read/writes and n. 2(n+m) load/stores are needed.

Memory and cache use are optimized since j need not be an even multiple of the number of rows. Competing methods require random access and/or access across rows.

The algorithm can also do generalized...