Browse Prior Art Database

Optimal Divide Table Lookup Selection by Divide Algorithm Simulation

IP.com Disclosure Number: IPCOM000105021D
Original Publication Date: 1993-Jun-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 4 page(s) / 87K

Publishing Venue

IBM

Related People

Spencer, AK: AUTHOR

Abstract

This design employs a iterative division algorithm to achieve a 13 cycle Fixed Point division performance. This algorithm starts with a table lookup value as its first input into the multiplication iterations. It is important to pick a table size that will fit on the chip and provides, the best performance for the divide instructions.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Optimal Divide Table Lookup Selection by Divide Algorithm Simulation

      This design employs a iterative division algorithm to achieve a
13 cycle Fixed Point division performance.  This algorithm starts
with a table lookup value as its first input into the multiplication
iterations.  It is important to pick a table size that will fit on
the chip and provides, the best performance for the divide
instructions.

      The problem is how to select an optimal table size which will
meet the requirements of space and performance trade-offs required in
today's complex microprocessor designs.

      The solution to the problem was arrived at by computer
simulation of the algorithm, where the table size could be varied and
the number of iterations required to converge to the result could be
tabulated.  The algorithm will converge within 6 iterations if the
divisor is chosen as the initial guess.  Table 1 shows the results of
2000 random 64 bit numbers divided by 32 bit numbers, (this is the
divide long instruction, our most demanding of the two division
instructions found in the POWER architecture, with divide short being
a subset of divide long).  Note that the second column is a count of
the number of divisions that converged to a correct result with the
number of iterations in the first column.  For the example in Table
1, which does not use a lookup table column.  For the example in
Table 1, which does not use a lookup table n=0, and k=0, and the
average number of cycles for this case is 4.4765.

       Table 1:  (n=0, k=0)

     Iterations     Divisions

         0              0
         1              1
         2             18
         3            195
         4            718
         5            949
         6            119

      A lookup table can be used to improve the average number of
iterations required to converge to the result.  The lookup table has
two dimensions.  The first is the number of bits of the divisor,
denoted by (n).  The second dimension is the number of good bit, or
(k).  Good bits are an approximation of one divided by the divisor,
accurate to k bits.

      Further simulations with various table sizes showed that at
certain sizes the maximum number of iterations required to converge
to a result would drop by 1.  In Table 2 for example, n=2, and k=2;
thus the first two bits of the divisor were used to index a 4 row
table of 2 bits in each row.  Again the same 2000 divisions used in
Table 1 are shown.  Note that no divisions required 6 iterations.
The average for this table size is now reduced to 3.8065.  Also note
the dramatic drop with just such a small table size.  Further
experimentation showed a steady drop in the average number of
iterations with successive drops in the maximum numbe...