Browse Prior Art Database

Floating Point 2:1 High Level Design

IP.com Disclosure Number: IPCOM000122634D
Original Publication Date: 1991-Dec-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 3 page(s) / 96K

Publishing Venue

IBM

Related People

Karim, FO: AUTHOR [+2]

Abstract

One goal was to design a small, high performance floating point unit which would fit into the single chip processor. The RISC System/6000* Floating point architecture defines a multiply add instruction (FMA) for IEEE double precision. This instruction requires a 53-bit by 53-bit multiply with a third 53-bit mantissa aligned and added to this result. The space required by the multiply array and adder to perform this instruction in a single pass was too large for the FPU to fit on the same chip with the rest of the processor. A decision was made to design a two-pass multiplication. However, to optimize performance, all non-multiply instructions would be a single pass (excluding divide). Another goal was to design a floating point which would NOT define the cycle time for the processor.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Floating Point 2:1 High Level Design

      One goal was to design a small, high performance floating
point unit which would fit into the single chip processor. The RISC
System/6000* Floating point architecture defines a multiply add
instruction (FMA) for IEEE double precision. This instruction
requires a 53-bit by 53-bit multiply with a third 53-bit mantissa
aligned and added to this result.  The space required by the multiply
array and adder to perform this instruction in a single pass was too
large for the FPU to fit on the same chip with the rest of the
processor.  A decision was made to design a two-pass multiplication.
However, to optimize performance, all non-multiply instructions would
be a single pass (excluding divide).  Another goal was to design a
floating point which would NOT define the cycle time for the
processor.  This meant that the cycle time goal of the FPU was
actually slightly less than the rest of the processor.
Background:

      You can view the floating point instruction set as broken up
into eight categories:
1. ACCUMULATE:  (A * C) + B, this is the base instruction for this
FPU. All other instructions are subsets of this instruction.
2. LOADS: Use a dedicated FPR write port: do not require the floating
point data flow.
3. STORES: treated the same as moves without the FPR write.
4. ADD: A + B, same as an FMA with C=1, this group includes compare
instructions without the FPR write.
5. MULTIPLY: A * C, same as FMA with B=0.
6. DIVIDE: this instruction is handled with repeated multiply add
instructions and a mini dataflow.
7. MOVE: this is also a subset of FMA with A=0.
8. SPECIAL: these instructions are handled either in control logic or
as a move instruction.
For a multiply add instruction: (A * C) + B
We split the C operand in half to do the multicycle multiply.
C = C(h)    C(l)
For the first cycle: A * C(l) added to the portion of B shifted into
the low 28 bits of the multiply
For the second cycle: A * C(h) added to the rest of the aligned B
operand.

      To increase performance, we must be able to do addition type
instructions in a single cycle.  However, the severe space
restrictions force us to reuse as much hardware as possible,
including control logic.  If w...