Browse Prior Art Database

A Table Based Method to Compute Square Root

IP.com Disclosure Number: IPCOM000123054D
Original Publication Date: 1998-Apr-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 2 page(s) / 106K

Publishing Venue

IBM

Related People

Agarwal, RC: AUTHOR [+4]

Abstract

A method is disclosed for calculating the square root of a floating point number using an economized Taylor series. It is suitable both for a software implementation and for a hardware implementation using microcode to control a floating point unit. It allows many of the steps in the calculation to proceed independently through a pipelined unit. It uses a very short table for the initial approximation, but each table entry contains two values. Some of the features in this algorithm are most suitable for a floating point architecture that implements a fused multiply-add operation, such as in the Power*, Power2*, and PowerPC* architectures. One feature of the algorithm also assumes that all quantities and operations are carried out in the same precision as that of the final result.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 46% of the total text.

A Table Based Method to Compute Square Root

      A method is disclosed for calculating the square root of a
floating point number using an economized Taylor series.  It is
suitable both for a software implementation and for a hardware
implementation using microcode to control a floating point unit.  It
allows many of the steps in the calculation to proceed independently
through a pipelined unit.  It uses a very short table for the initial
approximation, but each table entry contains two values.  Some of the
features in this algorithm are most suitable for a floating point
architecture that implements a fused multiply-add operation, such as
in the Power*, Power2*, and PowerPC* architectures.  One feature of
the algorithm also assumes that all quantities and operations are
carried out in the same precision as that of the final result.

      The description given here uses IEEE double precision format,
which has a 53-bit mantissa.  The final result, prior to rounding,
produces at least 56 bits of precision.  There are several methods
available for using this value to obtain a correctly rounded IEEE
result, but they are not described here.  A simpler version of the
algorithm is also disclosed which provides a single precision result.

      The algorithm, briefly, is as follows:

      Let q = sqrt(b), where b is a double precision IEEE
floating point number.  Without loss of generality, it can be
assumed that .5 =< b < 2.0.  For values of b in other ranges, only
the processing of the exponent is affected, which is not described
in this disclosure.  The 53-bit mantissa of b includes an implied
high order bit of '1'.

      The range of b is split into two subranges (.5, 1.0) and
(1.0, 2.0).  The mantissas are the same for both subranges, but the
least significant bit of the biased exponent is 0 and 1 for the two
subranges, respectively.  The range (.5, 1.0) is first considered.
The five high order mantissa bits after the implied '1' are used to
access a 32 entry table, each entry having two values, q0 and y0.
For the subrange (.5, 1.0), each q0 is a 12-bit approximation of
sqrt(b) for the value of b at the midpoint of the corresponding
interval, while each y0 is a full 53-bit precision value of
1/(q0*q0).  Thus the approximation is two-sided, i.e., q0 may be less
than or greater than sqrt(b).

      For b in the range (1.0, 2.0), the same table values are
used, but q0 is multiplied by sqrt(2).  This extra step eliminates
the need for a second table for this range.  No additional steps are
needed for y0, however.  Its values for this range are halved, but
that does not affect the mantissa, and its exponent is obtained
directly from the exponent of b.  From now on, the values q0s and
y0s will be used instead of q0 and y0, to indicate that they have
been scaled for the corresponding subrange.

      The approximation error e = 1 - b*y0s is first calculated.  The
table limits its range such that |e| <...