Browse Prior Art Database

Divide Early Out for the Floating Point

IP.com Disclosure Number: IPCOM000109985D
Original Publication Date: 1992-Oct-01
Included in the Prior Art Database: 2005-Mar-25
Document File: 2 page(s) / 77K

Publishing Venue

IBM

Related People

Chu, TV: AUTHOR [+4]

Abstract

The goal of this invention is to optimize the performance of the floating point divide instruction while living within our size restrictions.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Divide Early Out for the Floating Point

       The goal of this invention is to optimize the performance
of the floating point divide instruction while living within our size
restrictions.

      The single-chip design placed severe space restrictions on the
floating point unit.  However, being a low-end processor did not
prevent attempting to optimize performance within those space
restrictions.  Although the divide instruction does not occur with
great frequency in most benchmarks, if the instruction performance is
so poor, it will appreciably degrade the benchmarks.

      The floating point uses a Radix-4 (two-bit) non-restoring
division algorithm for its divide implementation.  For IEEE double
precision, with its 53-bit mantissas and the Guard and Round bits
necessary for rounding, the divide must loop 28 times on the quotient
calculation.  This is in addition to the necessary set-up cycles and
a writeback cycle which will round the result and store it back to
the FPRs.

      The floating point has a multicycle normalization process which
uses a feedback path in the writeback cycle. With the normalizer in
the writeback cycle, this latch actually looks like a left-shift
register.  If we were to force the normalizer to the left shift by
two, this writeback register could be used as the divide result
register.  In actuality, we only use the upper 55 bits of this
register.  This way, at the end of the quotient iterations, the
unrounded result is already in a normalized form.  This register is
preloaded with 54 leading zeros followed by a "10" during the set-up
cycles.  When this "10" is in the MSBs of this register, we are
completing the final cycle of the quotient iterations.

      The final piece of necessary background information is the
calculation of the final sticky bit.  After completion of the
quotient iterations, the remainder is ORed together to form the
s...