Browse Prior Art Database

Floating Point Convert to Integer Improved Implementation

IP.com Disclosure Number: IPCOM000113043D
Original Publication Date: 1994-Jul-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 4 page(s) / 166K

Publishing Venue

IBM

Related People

Elliott, TA: AUTHOR [+4]

Abstract

Some floating point units are based around the 'Fused Multiply-Add' (FMA) instruction, T = (A * C) + B, where T is the target register of the operation. In general, all other floating point instructions are a variation of the FMA. For an add instruction, T = A + B, we would simply set C = 1 and perform an FMA. For a move instruction, T = B, A would be set to 0 with C then becoming a "don't care" value. A large amount of silicon and complexity is involved for instructions which do not conform to this format. For example, a processor's divide instruction can require recursive additions based on a lookup table. To do this, additional entry points are needed in both the alignment shifter of the B operand and CSA multiply tree along with their associated control.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 36% of the total text.

Floating Point Convert to Integer Improved Implementation

      Some floating point units are based around the 'Fused
Multiply-Add' (FMA) instruction, T = (A * C) + B, where T is the
target register of the operation.  In general, all other floating
point instructions are a variation of the FMA.  For an add
instruction, T = A + B, we would simply set C = 1 and perform an FMA.
For a move instruction, T = B, A would be set to 0 with C then
becoming a "don't care" value.  A large amount of silicon and
complexity is involved for instructions which do not conform to this
format.  For example, a processor's divide instruction can require
recursive additions based on a lookup table.  To do this, additional
entry points are needed in both the alignment shifter of the B
operand and CSA multiply tree along with their associated control.
Knowing this, an attempt is made to implement all instructions as
subsets of FMA.

      Floating point numbers consist of a sign bit, a biased exponent
value, and a positive mantissa (mantissa = implied bit plus
fraction).

      Conceptually, converting a floating point number into a 32 bit,
2's complement integer is not straightforward.  Ignoring overflow and
negative numbers for the moment, the idea is place the 'integer'
portion of the mantissa (those bits that would be to the left of the
binary point if the exponent were adjusted to zero) in the 32 LSB
positions of the fraction field.  With previous techniques, adding
the B operand (the floating point number we wish to convert) to the
constant A = 1.0 * 2**(52) produces the integer portion of B. It is
placed in the 32 LSB's of the result mantissa; meaning that the 32
bits of importance for the "convert to integer" are always in the
same bit positions of the intermediate result.

      From this point forward, a previous FPU will be used for
comparison.  The previous floating point unit is a three stage
pipeline machine.  The first or multiply stage performs the alignment
of B to A*C and complements B if necessary (for FMS, FMNS, etc. where
the B operand needs to be negated).  The second or add stage reduces
the SUMs and CARRIES from the multiply stage to a single operand and
performs the 'carry in' (add 1) required for full 2's complementing.
The third or writeback stage does both normalization and rounding.

      Background from the previous FPU - All of the above is
straightforward and obviously easy to implement since the addition of
B with a constant is a subset of the FMA instruction.  Convert to
Integer becomes difficult to implement when the '-B' case is
computed.  This involves 2's complementing the integer portion of B
and being able to correctly round the intermediate result knowing the
sign of the integer.  2's complementing the B involves inverting and
adding '1' after the alignment is done.  Since the floating point
unit is capable of performing a logical subtract, |A| - |B|, 2's
complementing the B is not a problem...