Browse Prior Art Database

# Floating Point Convert to Double Word Integer

IP.com Disclosure Number: IPCOM000114976D
Original Publication Date: 1995-Feb-01
Included in the Prior Art Database: 2005-Mar-30
Document File: 4 page(s) / 172K

IBM

## Related People

Elliott, TA: AUTHOR [+3]

## Abstract

Converting a floating point number into a 64 bit 2's complement 'integer' presents a unique problem. A double precision floating point number consists of: a 1 bit sign, an 11 bit exponent, and a 52 bit fraction field. The implied bit, just as the name suggests, is NOT part of the 64 bit double precision representation. The value of the normal floating point number is: {(-1) ** Sign} * {2 ** (exponent - bias)} * 1.fraction

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 38% of the total text.

Floating Point Convert to Double Word Integer

Converting a floating point number into a 64 bit 2's complement
'integer' presents a unique problem.  A double precision floating
point
number consists of: a 1 bit sign, an 11 bit exponent, and a 52 bit
fraction field.  The implied bit, just as the name suggests, is NOT
part
of the 64 bit double precision representation.  The value of the
normal
floating point number is:
{(-1) ** Sign} * {2 ** (exponent - bias)} *  1.fraction

Conceptually, converting a floating point number into a 64 bit,
2's complement integer is not straight forward.  Ignoring overflow
and negative numbers for the moment, the idea is place the `integer'
portion of the mantissa (those bits that would be to the left of the
binary point if the exponent were adjusted to zero) into the 64 bit
result.
Example: Convert 5.75 (decimal) from floating point data to a
double
word integer
B = 1.0111 * 2**(2 + bias)    (floating point format for 5.75
decimal)
= 101.11 * 2**(bias)        (align binary point to exp = 0)
= 0000 ...  0000 0110       ("6" binary after rounding and

Prior art says that adding the B operand (the floating point
number we wish to convert) to a constant produces the integer portion
of B being placed in the 64 bits of the result.  Meaning that the 64
result bits for the "convert to double word integer" are always in
the same bit positions of the intermediate result.

During normal arithmetic operation, the intermediate result
gets rounded (increment or no- increment of the fraction bits)
depending on the Rounding Mode, the Guard, Round, and Sticky bits.
However, in the case where the fraction field is all 1's and we
increment, the result will be 10.00000...0.  In this case, we need to
effectively right shift the mantissa and increment the exponent by
one.  Obviously, this will result in a large degree of serialization
in the final stage of the floating point.  This serialization makes a
huge impact on speed paths in the writeback stage.  It should be
apparent that during floating point arithmetic operations, having a
'carry-out' occur due to rounding is quite rare.  Using that fact,
most high frequency floating point designs will take an additional
cycle to correct the exponent for this 'carry-out' case, eliminating
the serial path from the fraction into the exponent.

Floating point convert to double word integer differs from
normal floating point arithmetics is two respects.  First, the double
word integer must be treated as a 64 bit quantity, not as a sign,
exponent and fraction fields.  If a 'carry-out' of the mantissa were
to occur, not only would it have to increment the "exponent bits"
(bits 1..11 of the integer), it may also have to increment (XOR) the
"sign" bit (bit 0 of the integer) if the ex...