Browse Prior Art Database

Efficient Floating Point to Clipped Integer Conversion

IP.com Disclosure Number: IPCOM000108353D
Original Publication Date: 1992-May-01
Included in the Prior Art Database: 2005-Mar-22
Document File: 2 page(s) / 83K

Publishing Venue

IBM

Related People

Carter, JL: AUTHOR [+4]

Abstract

Disclosed is an efficient computer conversion from floating-point to clipped-integer. For a given integer N, the term "Clipped Conversion" will mean the process of converting a floating point number x to an integer between O and N. Specifically, the clipped conversion of x is O if x is negative; and otherwise, it is the largest integer between O and N that is not greater than x.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Efficient Floating Point to Clipped Integer Conversion

       Disclosed is an efficient computer conversion from
floating-point to clipped-integer.  For a given integer N, the term
"Clipped Conversion" will mean the process of converting a floating
point number x to an integer between O and N.  Specifically, the
clipped conversion of x is O if x is negative; and otherwise, it is
the largest integer between O and N that is not greater than x.

      The operation is broken in two steps:  first the conversion and
then the clipping.
Conversion:

      Assume x is in range -2K-1/x<2K .  Add 2L for a carefully
chosen L/K (typically, L + 20 or 52).  On many machines, this will
have the following effect:  if x is positive, the binary
representation of x will have its "binary point" at some fixed
location.  If x is negative, then at that same location will be the
value - [|2x|], where [] means take the smallest integer that is
larger than or equal to the argument.  For instance, if the number is
represented as an IEEE double precision floating point number and x
is positive, the integral part of x will begin at the (13 + L -
K)-th bit, counting from bit O in the high-order position, and end at
the (12 + L)-th bit.  If the number is represented as an IEEE double
precision floating point number and x is negative, the integer
beginning at the (13 + L - K)-th bit and ending at the (12 +
L)-th bit will be negative.  (That is, the (13 + L - K)-th bit will
be a one.)

      Extract the desired bits from the result.  Up to L bits may be
used.  How this is done depends on the machine. Assuming the result
is in a floating point register, it could be stored into memory, and
then the portion of the result may be loaded into a general-purpose
register.  If there is a delay between when a word is written into
memory and when it may be read, then care should be taken to coax the
compiler to insert instructions between the write and read.
A...