Browse Prior Art Database

Low Precision Floating Point for Signal Processing

IP.com Disclosure Number: IPCOM000052271D
Original Publication Date: 1981-May-01
Included in the Prior Art Database: 2005-Feb-11
Document File: 2 page(s) / 13K

Publishing Venue

IBM

Related People

Todd, SJP: AUTHOR

Abstract

This invention relates to a machine implementable arithmetic method based on logarithmic representation of numbers. It uses conventional add/subtract to implement multiply/divide and table look-up for add/subtract. The approach is an extension of that of Swartzlander and Gilbert, "Arithmetic for Tomography," IEEE Transactions on Computers 29, 341-353 (May 1980). The object of the method is to improve the useful signal-to-noise ratio and in reducing table sizes needed. The method has application where the use of fixed point is complicated by dynamic range requirements, and where fairly poor signal-to-noise ratio is acceptable.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 56% of the total text.

Page 1 of 2

Low Precision Floating Point for Signal Processing

This invention relates to a machine implementable arithmetic method based on logarithmic representation of numbers. It uses conventional add/subtract to implement multiply/divide and table look-up for add/subtract. The approach is an extension of that of Swartzlander and Gilbert, "Arithmetic for Tomography," IEEE Transactions on Computers 29, 341-353 (May 1980). The object of the method is to improve the useful signal-to-noise ratio and in reducing table sizes needed. The method has application where the use of fixed point is complicated by dynamic range requirements, and where fairly poor signal-to-noise ratio is acceptable.

A number, n, is stored as a k+1 bit word. One bit is used for the sign, ns. The remaining bits form a positive exponent, ne, O<=ne<2**k. This word represents the number n=+-base** (ne-scale). The "K", "base" and "scale" can be chosen to fit a particular application. The range of numbers that can be represented is base **(2**k). Any number can be represented within an accuracy of a factor of sq rt (base). There is no natural representation of zero. One may assign exponent 0 to the number 0. This requires special casing in the operations. Swartzlander and Bilbert use the same representation, but require base=2. This means that their system has a low signal-to-noise ratio.

It is desired to demonstrate the method as an arithmetic operation on two positive numbers (a, b) and results (c1, c2, c3, c4); these are represented by ae, be and ce1, ce2, ce3, ce4. a=base** (ae-scale)

b=base** (be-scale)

c1=base**(ce1-scale)

etc.

To operate on this representation, the system must be able to derive the ce's from ae and be.

Thus,

1) c1 = a*b = base**((ae+be-scale)-scale)

1a) ce1 = ae+be-scale

2) c2 = a/b = base**((ae-be+scale)-scale)

2a) ce2 = ae-be+scale

3) c3 = a+b = (base**ae+base**be

=base**ae * (1+base**(be-ae)) *base*...