Floating Point Multiply Split
Original Publication Date: 1991-Dec-01
Included in the Prior Art Database: 2005-Apr-04
Chu, TV: AUTHOR [+3]
The IEEE double precision multiply requires an array capable of handling 53-bit mantissa operands. Although the Booth encoding technique reduces the number of partial products nearly one half (27), this array would still use more silicon than we would allow. This floating point opted to implement a multicycle multiply array using booth encoding. The figure below shows the split of the "C" operand:
Floating Point Multiply Split
double precision multiply requires an array
capable of handling 53-bit mantissa operands. Although the Booth
encoding technique reduces the number of partial products nearly one
half (27), this array would still use more silicon than we would
allow. This floating point opted to implement a multicycle multiply
array using booth encoding. The figure below shows the split of the
was chosen so every partial product generated over
the two cycles would exactly match the partial products of an unsplit
C operand. Since every partial product matches exactly, the
multiplication using the split method is guaranteed to be equal to
the unsplit method.
significant problem arose using this split.
encoding, a partial product may be +1, +2, or 0 times the
multiplicand. In the unsplit method, you were guaranteed the most
significant partial product would not be negative since the operands
could not be negative. This allows us to hide the carry in for all
lower partial products in the shift bits of the next higher partial
product. See the following figure.
where Xn is the carry in for partial product N if it is negative.
Note: partial product 3 is always positive, so X3=0
In our split method, it is possible to have
significant partial product from the first pass be negative.
Following are three possible solutions to this sign bit which we
could not easily hide.
1: Add an additional leg into the CSA
tree to hide
this carry-in. Remembering that this problem was introduced by
splitting the multiplication into two cycles, which was done to save
space. Adding an additional CSA to the tree is not that appealing.
2: Realizing that the CSA tree will be
reused for the
"high" multiplication, we could hide the sign from the 14th partial
product within the tree.
14th partial product "add 1" problem, we would be
able to add the low 28 bits and save them in the unnormalized result
latch. The bits above the low 28 are feedback to the multiplier and
added to the second multiply. However, the "add 1" bit from partial
product 14 is in position 26 from the bottom. If we were to add only
26 bits together and feedback the rest to the multiplier, we would be
able to add that bit in. The following figure details this solution.
back the sum1 and carry1 above the low 26 bits, it
is obvious there will be a position a available to add the "add 1"
bit from partial product 14 labeled X14 above). The two bits in the
multiplier are used only on the second multiple cycle to handle X14 .
3: If you remove the restriction that
product on the multicycle multiply should equal the partial products
of the single cycle multiply, it is possible to eliminate the "carry
in" on the most significant partial product. A split of 27...