Browse Prior Art Database

Method for a FPMAC processing block with conditional post-normalization and rounding units

IP.com Disclosure Number: IPCOM000125115D
Publication Date: 2005-May-19
Document File: 4 page(s) / 130K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for a floating point multiply and accumulate (FPMAC) processing block with conditional post-normalization and rounding units. Benefits include improved functionality and improved power performance.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 43% of the total text.

Method for a FPMAC processing block with conditional post-normalization and rounding units

Disclosed is a method for a floating point multiply and accumulate (FPMAC) processing block with conditional post-normalization and rounding units. Benefits include improved functionality and improved power performance.

Background

              Many digital signal processing (DSP) and graphics applications require extensive computing power for real-time processing. A primary component of the applications and floating point benchmarks like LINPACK is computation of the following algorithm:

åaibi where i = 1 to N                                                                                                                            [1]

              The values ai and bi are both floating point numbers in either single or double precision format. N is the number of desired consecutive accumulate operations.

              A high-performance FPMAC design typically uses dedicated hardware to perform this computation at high frequencies. Additionally, achieving a good FLOPS/Watt performance ratio has become equally as important a design metric.

              Conventionally, post normalization occurs within the accumulate loop. Due to this architecture, every 2 operands that are multiplied must go through the post-normalization phase. In addition, in long streams of FPMAC operations, the intermediate summation results are also normalized and rounded, while only the final sum is desired. This consumes significant power even though a majority of the results of these computations are never used.

             

General description

              The disclosed method is a power-optimized FPMAC block, and uses the architecture described in [1]. The post-normalization and rounding blocks in this design are “energized only at the end” of a stream of N-1 accumulate computations. This is possible since the post-normalization processing occurs outside the accumulate logic when the final result is achieved. If N is relatively small, only clock gating is used. Otherwise, clock gating and sleep control are used.

              The method reduces the active and leakage power consumption by clock gating and enabling sleep control for the post-normalization block of the FPMAC. The disclosed method gates the clock to a particular logic block when we do not need to normalize the accumulate results. This eliminates unnecessary switching in the post- normalization and rounding units, which saves active power. The method uses the dynamic sleep transistor technique [1] by adding an extra transistor to the discharge path to minimize power leakage, depending on FPMAC activity.

Advantages

              The disclosed method provides the following advantages, including:
•             Improved functionality due to providing an FPMAC processing block with conditional post-normalization and rounding units

•...