Browse Prior Art Database

Efficient reduction of a polynomial multiplication chain Disclosure Number: IPCOM000028254D
Original Publication Date: 2004-May-06
Included in the Prior Art Database: 2004-May-06
Document File: 4 page(s) / 42K

Publishing Venue



To improve the efficiency of a CRC calculation engine which supports incremental changes in the data, a multiplicative reduction engine can be used. To fully utilize the multiplication units in such a reduction unit, it is necessary to fill their pipelines with as many pending, independent contributions as possible. The description below will significantly increase the throughput of a reduction engine.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 36% of the total text.

Page 1 of 4

Efficient reduction of a polynomial multiplication chain

Definition of terms

For each change in the data block, a contribution to the final CRC is created and stored in a set of tuples. Each tuple has the form (pos, d), where pos is the position relative to the end of the frame, and d is the delta. In a simplistic scenario, the effect of each of these contributions would be calculated using the B(pos, d) operation of [1].

For simplicity, we herein assume that B(pos, d) does not directly modify v, the current value. Instead, it returns the change that would need to be added (in GF(2)) to the value. posThus, B(pos, d) returns d*x (mod p), where p is the generator polynomial (again, all operations in GF(2)).

Description of the logical operation

The direct application of B() for each of the tuples would typically be inefficient, as each evaluation of B() would generally require several multiplications to be performed. The goal is to reduce the number of multiplications and increase the parallelism of operations (and thus the throughput) with only small implications on the speed and size of the control logic.

A greedy algorithm is used to improve throughput. It does select two contributors, (c1, d1) and (c2, d2), with c1 >= c2, i.e., c1 is further away from the end than c2. It then merges the tuple at position c1 with the one at c2, removing the former and updating the latter. The two tuples are selected as follows:

The first tuple (c1, d1) is selected as the tuple farthest away from the end, i.e., the one with maximum c1.

The second tuple (c2, d2) is selected as the tuple with least cost from c1, i.e., that the difference c1-c2 (calculated using ordinary arithmetic) can be created as the sum of the fewest "factors" (they are called factors because they are typically used in the multiplication). If multiple factors fulfill share the minimum cost, the one with the largest resulting difference c1-c2 is chosen. The first criteria minimizes the number of steps, while the second criteria maximizes the level of parallelism.

The combination of the two contributors can be performed as follows:

Atomically read (c1, d1) and delete the tuple form the set of contributors

(select another (c1, d1) if this one is locked and thus already the target of a combination in progress; in practice, you might want to check for locking before identifying the best (c2, d2)) Optionally lock (c2, d2) (to prevent deletion)

Calculate b = B(c1-c2, d1) (ordinary arithmetic for the difference)






Atomically change (c2, d2) to (c2, d2 XOR b)

If a shared lock was issued, unlock (c2, d2)


Page 2 of 4

Cost calculation

If the factors are all powers of two, the cost of a difference can simply be calculated by counting the bits that are set to one.

If the factors should not only the powers of two, then the problem probably becomes NP-hard (variant of the Knapsack problem). Known heuristics for the Knapsack problem might be useful. Recall that we want to minimize...