# Iterative Method of Extended Integer Divide

Original Publication Date: 2008-May-15

Included in the Prior Art Database: 2008-May-15

## Publishing Venue

IBM

## Abstract

Most implementations of integer divide are based on the SRT algorithm, which is similar to the repeated subtraction method often done by hand called "long division". This method produces a fixed number of quotient bits each cycle, usually 2. It can be designed to finish early after the required number of quotient bits is obtained. For example, if the numerator has 58 significant bits and the divisor has 30 significant bits, then the quotient must have either 28 or 29 bits (58 - 30 or 58 - 30 + 1). Thus, the algorithm can finish after producing 29 quotient bits. The methods described in this publication are based on the Fixed-Point Divide instructions that are implemented in hardware within the the Z-Series Binary Floating Point Unit. The basic algorithm is used in the P6 binary floating point unit. Both units are pipelined fused multiply-add units that are modified to also provide other functions such as instructions that convert between integer and floating point. The divide algorithm is based on P. Markstein's software algorithm used with the Intel ia-64 processor. It is classified as an "iterative" algorithm that only produces an 'estimate' of the quotient, but which doubles the precision of that estimate every iteration up to the precision of the multiply-add unit. Other iterative algorithms are the Newton-Raphson and Goldschmidt algorithms. The algorithm described in the afore-mentioned docket is for integer divide instructions in which the numerator is no more than 64 significant bits. The Z-Series architecture has integer divide instructions in which the numerator may have up to 128 significant bits, although the quotient is still limited to 64 bits. Since the instructions are implemented in the Binary Floating Point Unit, which executes multiply-add operations for operands up to 64 bits, a larger numerator poses several problems and requires several additional techniques which are described in this publication. The hardware first determines if the numerator has more than 64 significant bits. If the 64 most significant bits (held in a separate "high-order word") are all zeros, then the algorithm reverts to that of the aforementioned docket for operands of 64 bits or less. When the high-order word is not zeros, then the scheme is to use the same aforementioned algorithm but using only the 64 most significant bits of the numerator, in floating point format. This provides an intermediate quotient that is close to the correct quotient. It may be larger than the correct quotient by at most 1, or smaller than the correct quotient by at most 2. This invention describes first a simple procedure for converting the two integer words of the numerator to a floating number containing the 64 most significant bits as its mantissa. This invention also describes a procedure for calculating a remainder corresponding to the intermediate quotient, and using that to obtain the correct quotient.

**This text was extracted from a PDF file.**

**This is the abbreviated version, containing approximately 28% of the total text.**

__Page 1 of 5__Iterative Method of Extended Integer Divide

Most implementations of integer divide are based on the SRT

algorithm, which is similar to the repeated subtraction method

often done by hand called "long division". This method produces a

fixed number of quotient bits each cycle, usually 2. It can be

designed to finish early after the required number of quotient

bits is obtained. For example, if the numerator has 58

significant bits and the divisor has 30 significant bits, then

the quotient must have either 28 or 29 bits (58 - 30 or 58 - 30 +

1). Thus, the algorithm can finish after producing 29 quotient

bits.

The methods described in this publication are based on the

Fixed-Point Divide instructions that are implemented in hardware

within the Z-Series Binary Floating Point Unit. The basic

algorithm is used in the P6 binary floating point unit. Both

units are pipelined fused multiply-add units that are modified to

also provide other functions such as instructions that convert

between integer and floating point. The divide algorithm is based

on P. Markstein's software algorithm used with the Intel 64

processor. It is classified as an "iterative" algorithm that only

produces an 'estimate' of the quotient, but which doubles the

precision of that estimate every iteration up to the precision of

the multiply-add unit. Other iterative algorithms are the

Newton-Raphson and Goldschmidt algorithms.

The algorithm described in the aforementioned docket is for

integer divide instructions in which the numerator is no more

than 64 significant bits. The Z-Series architecture has integer

divide instructions in which the numerator may have up to 128

significant bits, although the quotient is still limited to 64

bits. Since the instructions are implemented in the Binary

Floating Point Unit, which executes multiply-add operations for

operands up to 64 bits, a larger numerator poses several problems

and requires several additional techniques which are described in

this publication.

The algorithm may also be used for any divide where the

numerator may be greater than the operand size allowed by the

multiplier. Thus, an integer divide allowing a 64-bit numerator

and 32-bit quotient and divisor could be implemented with this

algorithm using an execution unit having only a 32-bit

multiplier. The algorithm may also be adapted for software

implementation of integer divide, where the numerator may be

double the allowed operand size of the arithmetic instructions.

Thus, it must be represented using two separate words.

Previous implementations of the Z-Series and System/390/370

implemented integer divide instructions using the SRT algorithm,

so the extension to handle large numerators was not difficult.

For the integer divide instructions allowing numerators

greater than 64 bits, the operands are all treated as unsigned,

1

__Page 2 of 5__ thus they are all positive values or zero. The algorithm

described here specifically applies to unsigned numbers only.

However, those...