Browse Prior Art Database

Single Cycle/Writeback Cycle Floating Point Denormalization

IP.com Disclosure Number: IPCOM000112203D
Original Publication Date: 1994-Apr-01
Included in the Prior Art Database: 2005-Mar-26
Document File: 4 page(s) / 171K

Publishing Venue

IBM

Related People

Elliott, TA: AUTHOR [+4]

Abstract

Some floating point units are based on a 3 stage pipeline. In the Multiply stage, the 'A' * 'C' is computed, with the properly aligned 'B' operand added in, down to Sums and Carries. (aligning the 'B' operand requires a right shift of the 'B' mantissa until the 'B' exponent aligns with the A*C' exponent). In the Add stage, those Sums and Carries are added to form an intermediate result. The writeback stage performs both the normalization and rounding. For massive normalization, multicycle feedback on the writeback stage is required. (Normalization involves left shifting the intermediate result and subtracting from the exponent until a leading '1' is reached). (Fig. 1 - Mantissa Data Flow.)

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 36% of the total text.

Single Cycle/Writeback Cycle Floating Point Denormalization

      Some floating point units are based on a 3 stage pipeline.  In
the Multiply stage, the 'A' * 'C' is computed, with the properly
aligned 'B' operand added in, down to Sums and Carries.  (aligning
the 'B' operand requires a right shift of the 'B' mantissa until the
'B' exponent aligns with the A*C' exponent).  In the Add stage, those
Sums and Carries are added to form an intermediate result.  The
writeback stage performs both the normalization and rounding.  For
massive normalization, multicycle feedback on the writeback stage is
required.  (Normalization involves left shifting the intermediate
result and subtracting from the exponent until a leading '1' is
reached).  (Fig. 1 - Mantissa Data Flow.)

      Underflow can be most easily determined by exponent comparison
after normalization is complete.  If an underflow occurs and the
FPSCR(UE) = 0, the normalized intermediate result must be
denormalized.  Denormalization involves right shifting the mantissa
and adding '1' to the exponent until the exponent equals the minimum
representable for that format (Fig. 2 - Exponent Data Flow).

      Floating Point Background (part 2): The alignment of the 'B' to
the 'A*C' uses the following formula to calculate the shift amount:
SC = E(a) + E(c) - E(b) + 56 - bias.  The 56 is an offset used to
enable us to have a one directional shifter (right shift).  This will
result in an intermediate result 161 bits wide.  For the intermediate
exponent result, the larger of {E(b), E(a) + E(c) + 56} is chosen.

      By observing Fig. 1, the only right shifter in the floating
point unit is in the Multiply stage.  To perform denormalization,
some processors feedback the writeback result to the registers above
the Multiply stage.  By properly controlling the alignment shifter
with constants, denormalization can be performed.  The result can
then be piped down to the writeback cycle and rounding can occur.
(Result denormalization takes 3 clocks).  This solution created
problems in the areas of complexity, performance, and silicon usage.

1.  Complexity : By wrapping back in a multistage pipeline, it is
    possible for instructions initiated in-order to complete
    out-of-order.  There are two possible solutions to this problem:
    1.  Build an interlocking unit capable of out-of-order execution.
    Given the "sticky" nature of the FPSCR bits, this would be very
    complicated.

2.  In cases where the result would require denormalization, we would
    ensure that the pipeline above writeback would be empty, thus not
    creating an out-of-order execution problem.  In order to keep
    subsequent instructions from entering the pipeline, we need to
    hold them above the multiply stage.  To do this, we need to know
    at the multiply stage if an instruction will denormalize.  As
    stated earlier, underflow will not be determined un...