Browse Prior Art Database

High Speed Prediction/Detection of the Leading Zeros in Floating Point Result Provides Method for Creation of Early Shift Control Signals

IP.com Disclosure Number: IPCOM000117721D
Original Publication Date: 1996-May-01
Included in the Prior Art Database: 2005-Mar-31
Document File: 4 page(s) / 152K

Publishing Venue

IBM

Related People

Bartling, SC: AUTHOR

Abstract

IEEE-compatible floating point units are required to produce a normalized result. In order to reduce pipeline depth a Leading Zero Anticipator (LZA) is used in parallel with the floating point addition unit to predict the location of the leading bit with a arbitrary level of precision. Predicting the location of the leading one can be done to within one bit location of the actual leading one in the addition result. Since LZAs do not know if a carry occurred, the prediction can be off by one bit location (the prediction is either correct, or the leading one is actually one bit to the right). A Leading Zero Detector (LZD) is required after the result of the floating point addition is available to pinpoint the exact location of the leading one.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 36% of the total text.

High Speed Prediction/Detection of the Leading Zeros in Floating
Point Result Provides Method for Creation of Early Shift Control
Signals

      IEEE-compatible floating point units are required to produce a
normalized result.  In order to reduce pipeline depth a Leading Zero
Anticipator (LZA) is used in parallel with the floating point
addition unit to predict the location of the leading bit with a
arbitrary level of precision.  Predicting the location of the leading
one can be done to within one bit location of the actual leading one
in the addition result.  Since LZAs do not know if a carry occurred,
the prediction can be off by one bit location (the prediction is
either correct, or the leading one is actually one bit to the right).
A Leading Zero Detector (LZD) is required after the result of the
floating point addition is available to pinpoint the exact location
of the leading one.

      LZAs are not usually constructed to be precise down to within
one bit location.  The time available to evaluate the LZA algorithm
is limited by the operating frequency required for the floating point
unit and by the size budget for the physical implementation of the
LZA.  Therefore, the output of the floating point adder is typically
evenly divided into several groups and the LZA is used to predict
which of the groups the leading one can be found in.  The LZD will
now be used to pinpoint the exact location of the leading one within
the group the LZA predicted would contain the leading one.

      The size of the groupings always parallels the organization of
the shifter used to move the leading one into the most significant
bit location of the floating point result.

      For example, IBM* floating point units are required to produce
a fully precise result for a double precision multiply instruction.
This produces a datapath width of 106 bits at the output of the
floating point multiplier.  Thus, if pipeline depth is to be kept to
a minimum, the floating point adder and the LZA must have a bit width
of 106 bits.  Also, the maximum shift required to move the leading
one into the Most Significant Bit (MSB) is now equal to 106.  Since
it is not an efficient use of area (nor is it a high speed solution)
to build a single stage shifter with the largest shift mux requiring
106 data inputs and 106 data inputs and 106 selects, the shifter is
typically broken up into two or three stages (three stages are the
most common solution).

For the IBM example, the 106 bit shift will be broken up into 2
stages:
      Stage 1: shift by 0-16 bits
      Stage 2: shift by 0-16 bits

      This provides a maximum shift of 16x16=256 which is nore than
sufficient to accomplish the task.

      Typically, the LZA will be designed to predict in which group
to 16 will contain the leading one.  (Please note that 106 is not
evenly divisible by 16, so the least significant group will contain
only 10 bits).  This output can be used to drive...