Browse Prior Art Database

Hardware Implementation of Sine/Cosine Polynomial Approximation

IP.com Disclosure Number: IPCOM000114242D
Original Publication Date: 1994-Nov-01
Included in the Prior Art Database: 2005-Mar-28
Document File: 6 page(s) / 201K

Publishing Venue

IBM

Related People

Desrosiers, B: AUTHOR [+4]

Abstract

Sine/Cosine function requests time consuming calculations but they are not used intensely in numeric softwares. Their evaluations must not imply deep modifications of the architecture devoted to main instructions (Add, Subtract, Multiply,...) which implements the ANSI/IEEE standard 754-1985 for Binary Floating Point Arithmetic, in order to not degrade their performance.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 31% of the total text.

Hardware Implementation of Sine/Cosine Polynomial Approximation

      Sine/Cosine function requests time consuming calculations but
they are not used intensely in numeric softwares.  Their evaluations
must not imply deep modifications of the architecture devoted to main
instructions (Add, Subtract, Multiply,...) which implements the
ANSI/IEEE standard 754-1985 for Binary Floating Point Arithmetic, in
order to not degrade their performance.

      Because a multiplier array (1/5) has been implemented for
multiply instructions, the Tchebitchev polynomial approximation
discloses the best choice according to our Datapath (Fig. 1).  This
algorithm to evaluate Sine/Cosine for argument less than PI/4 meets
the three following requirements:
  o  fast convergence               => 83 cycles per Sine
                                     => 85 cycles per Cosine
  o  good accuracy                  => relative error < 2**-63
  o  low architecture modifications => the polynomial coefficients
are
      stored in a ROM which is connected to the Dataflow.  Controls
are
      microcoded (512 words of 68 bits).

      Algorithm: The magnitude of the permitted error (< 2**-63)
directly affects the degree of the polynomial and, by the way, the
number of iterations required to evaluate the polynomial under the
Horner scheme:
              P(X)= (...(((Ai x X) + Ai-1) x X) +...A0)

      For Sine, the 15th degree of the polynomial meets the relative
error (< 2**-63).  As Psin(X)/X is an even function, the number of
iterations is 7 plus the multiply by X.  When argument is equal to or
less than 2**-33, Sine is evaluated by only the first term of
Taylor/Mac-Laurin expansion (Sine = X).  Fig. 2 illustrates the
entire Sine algorithm in the range (0,PI/4).

      For Cosine, the 16th degree of the polynomial meets the
relative error (< 2**-63).  As Pcos(X) is an odd function, the number
of iterations is 8.  When argument is equal to or less than 2**-33,
Cosine is evaluated by IEEE standard 754 rounded value of 1 -
(2**-67).  Fig. 3 illustrates the entire Cosine algorithm in the
range (0,PI/4).

      Datapath:  Fig. 1 shows Datapath.  It includes 3 units working
in parallel which allows concurrent operations:
  o  Exponent arithmetic hardware handles 15 bit numbers.  A End
      Around Carry adder allows a one cycle exponent subtraction.
The
      TR3E register can control the shifters (ALI or NOR) through
      decode stage (DEC).
  o  Mantissa arithmetic hardware handles 67 bit numbers.  The 64
      mantissa bits plus three "extra" bits called Guard, Round and
      Sticky bit.  These bits are used in the rounding operation
      according to the IEEE standard 754 to maintain accuracy when
the
      precision of a result exceeds that available for finite
hardware.
      A 67 bit mantissa subt...