Browse Prior Art Database

Iterative Method of Extended Integer Divide

IP.com Disclosure Number: IPCOM000170316D
Original Publication Date: 2008-May-15
Included in the Prior Art Database: 2008-May-15
Document File: 5 page(s) / 71K

Publishing Venue

IBM

Abstract

Most implementations of integer divide are based on the SRT algorithm, which is similar to the repeated subtraction method often done by hand called "long division". This method produces a fixed number of quotient bits each cycle, usually 2. It can be designed to finish early after the required number of quotient bits is obtained. For example, if the numerator has 58 significant bits and the divisor has 30 significant bits, then the quotient must have either 28 or 29 bits (58 - 30 or 58 - 30 + 1). Thus, the algorithm can finish after producing 29 quotient bits. The methods described in this publication are based on the Fixed-Point Divide instructions that are implemented in hardware within the the Z-Series Binary Floating Point Unit. The basic algorithm is used in the P6 binary floating point unit. Both units are pipelined fused multiply-add units that are modified to also provide other functions such as instructions that convert between integer and floating point. The divide algorithm is based on P. Markstein's software algorithm used with the Intel ia-64 processor. It is classified as an "iterative" algorithm that only produces an 'estimate' of the quotient, but which doubles the precision of that estimate every iteration up to the precision of the multiply-add unit. Other iterative algorithms are the Newton-Raphson and Goldschmidt algorithms. The algorithm described in the afore-mentioned docket is for integer divide instructions in which the numerator is no more than 64 significant bits. The Z-Series architecture has integer divide instructions in which the numerator may have up to 128 significant bits, although the quotient is still limited to 64 bits. Since the instructions are implemented in the Binary Floating Point Unit, which executes multiply-add operations for operands up to 64 bits, a larger numerator poses several problems and requires several additional techniques which are described in this publication. The hardware first determines if the numerator has more than 64 significant bits. If the 64 most significant bits (held in a separate "high-order word") are all zeros, then the algorithm reverts to that of the aforementioned docket for operands of 64 bits or less. When the high-order word is not zeros, then the scheme is to use the same aforementioned algorithm but using only the 64 most significant bits of the numerator, in floating point format. This provides an intermediate quotient that is close to the correct quotient. It may be larger than the correct quotient by at most 1, or smaller than the correct quotient by at most 2. This invention describes first a simple procedure for converting the two integer words of the numerator to a floating number containing the 64 most significant bits as its mantissa. This invention also describes a procedure for calculating a remainder corresponding to the intermediate quotient, and using that to obtain the correct quotient.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 28% of the total text.

Page 1 of 5

Iterative Method of Extended Integer Divide

Most implementations of integer divide are based on the SRT
algorithm, which is similar to the repeated subtraction method
often done by hand called "long division". This method produces a
fixed number of quotient bits each cycle, usually 2. It can be
designed to finish early after the required number of quotient
bits is obtained. For example, if the numerator has 58
significant bits and the divisor has 30 significant bits, then
the quotient must have either 28 or 29 bits (58 - 30 or 58 - 30 +
1). Thus, the algorithm can finish after producing 29 quotient
bits.

     The methods described in this publication are based on the
Fixed-Point Divide instructions that are implemented in hardware
within the Z-Series Binary Floating Point Unit. The basic
algorithm is used in the P6 binary floating point unit. Both
units are pipelined fused multiply-add units that are modified to
also provide other functions such as instructions that convert
between integer and floating point. The divide algorithm is based
on P. Markstein's software algorithm used with the Intel 64
processor. It is classified as an "iterative" algorithm that only
produces an 'estimate' of the quotient, but which doubles the
precision of that estimate every iteration up to the precision of
the multiply-add unit. Other iterative algorithms are the
Newton-Raphson and Goldschmidt algorithms.

     The algorithm described in the aforementioned docket is for
integer divide instructions in which the numerator is no more
than 64 significant bits. The Z-Series architecture has integer
divide instructions in which the numerator may have up to 128
significant bits, although the quotient is still limited to 64
bits. Since the instructions are implemented in the Binary
Floating Point Unit, which executes multiply-add operations for
operands up to 64 bits, a larger numerator poses several problems
and requires several additional techniques which are described in
this publication.

     The algorithm may also be used for any divide where the
numerator may be greater than the operand size allowed by the
multiplier. Thus, an integer divide allowing a 64-bit numerator
and 32-bit quotient and divisor could be implemented with this
algorithm using an execution unit having only a 32-bit
multiplier. The algorithm may also be adapted for software
implementation of integer divide, where the numerator may be
double the allowed operand size of the arithmetic instructions.
Thus, it must be represented using two separate words.

     Previous implementations of the Z-Series and System/390/370
implemented integer divide instructions using the SRT algorithm,
so the extension to handle large numerators was not difficult.

     For the integer divide instructions allowing numerators
greater than 64 bits, the operands are all treated as unsigned,

1

Page 2 of 5

thus they are all positive values or zero. The algorithm
described here specifically applies to unsigned numbers only.
However, those...