Browse Prior Art Database

Fast, Exact, Square Rooting on Fixed Point DSP Disclosure Number: IPCOM000022128D
Publication Date: 2004-Feb-25
Document File: 3 page(s) / 70K

Publishing Venue

The Prior Art Database


Disclosed is a method for a new architecture for fast (i.e. hardware-aided) square root instruction. Benefits include implementing the solution on any platform.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 47% of the total text.

Fast, Exact, Square Rooting on Fixed Point DSP

Disclosed is a method for a new architecture for fast (i.e. hardware-aided) square root instruction. Benefits include implementing the solution on any platform.


Square rooting operations are very important in computer arithmetic. For wireless radios, fast and exact square rooting of real integers is frequently needed in the implementation of:

§         Modem functions, such as envelope estimation, magnitude normalization for AGC, squelching, audio compressors, spectral analysis, and synchronization

§         Receiver front-end array processing functions, such as LU/ eigen/QR matrix decompositions for smart antenna (beam forming), and multi-user detection

§         Graphics and video algorithms, such as perspective transformation and rotation

Accurate evaluation of square rooting often relies on a floating point implementation, which is slow and increases power consumption. The current methods for finding the square root of a number are (1) Newton-Raphson (NR) iterative method (2) Polynomial Approximation method.

To retain the accuracy, the polynomial evaluation has to be done in the Horner nested (Add-multiply) form, which often requires extra cycles with the multiply-accumulate architecture of DSPs. Further, it is often difficult to get the result correct to the last bit with polynomial methods, even if one is prepared to store a larger number of coefficients. Typically a 7-coefficient McLaurin -Taylor series expansion needs about 50 DSP cycles for 16 bit input.

General Description

The disclosed method has its roots in a manual (i.e. paper and pencil) square rooting technique. This decimal technique (also known as the "completing the square" algorithm) is similar to long division and is fairly tedious for hand calculation. However, its binary version is an efficient candidate for DSP implementation (see Figure 1).

At each step, a digit of the result is produced by inspecting the shifted partial remainder, which is derived from the previous digit selections. With the binary scheme, a partial result Yi (radical or the square root of the radicand X) is constructed one bit yi at a time, such that Yi2 ≤ X. The key step in binary digit selection is the use of the identity (2Yi-1+yi)2 = 4Yi-1 2 + yi*(4Yi-1 + yi ). After subtracting out the shifted previous square 4Yi-1 2, one needs to find the bit yi such that yi*(4Yi-1 + yi ) is the largest integer contained in the partial remainder. That is, at the i-th step of the process, the shifted partial remainder Xi is inspected and according to the selection rules, the value of the i-th digit yi is determined. With the binary version of square rooting, the selection of yi is simple, because the choices are only 0 and 1. If the choice of yi = 1 produces a negative shifted partial remainder, it is replaced with yi = 0. In the next section, we propose a simple efficient hardware assisted implementation for the binary approach.

A New DSP Instruction Primitive For Hardwar...