Browse Prior Art Database

Compression of Numeric Data

IP.com Disclosure Number: IPCOM000120410D
Original Publication Date: 1991-Apr-01
Included in the Prior Art Database: 2005-Apr-02
Document File: 2 page(s) / 74K

Publishing Venue

IBM

Related People

Larner, A: AUTHOR

Abstract

This article describes a method of compressing numeric data, allowing a compromise between size and accuracy. The method is particularly appropriate for the storage of very large quantities (hundreds of gigabytes) of numeric data, e.g., the product quantities derived from point-of-sale input.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Compression of Numeric Data

      This article describes a method of compressing numeric
data, allowing a compromise between size and accuracy.  The method is
particularly appropriate for the storage of very large quantities
(hundreds of gigabytes) of numeric data, e.g., the product quantities
derived from point-of-sale input.

      According to the method:  given a range of values, say, R
values, constrained by size, e.g., the 255 values that can be held in
an eight- bit byte (allowing for a Null, say, "not yet recorded"),
and an accuracy, a, e.g., 0.025 (2.5%), to which data is to be kept;
it is possible to calculate an optimal (maximum range) mapping:
      p -> q <-> r
      such that any number, p, in a certain range, say 0 to P, is
mapped to a number, q, in the same range, under the restriction that:
      -a <= (p-q)/q <= a.

      The value P is a function of R and a.  Using the sample values
of R=255 and a=0.025, P comes out at just over 2,000,000.  Thus, a
numeric field that would take 20 bits (probably, in practice, 3 or
even 4 eight-bit bytes) can be compressed into 8 bits (i.e., 5 to 2,
or 3 to 1, or even 4 to 1 compression).  In general, P significantly
exceeds the Rth power of (1+2a).  (The 255th power of 1.05 is about
250,000.)

      The method is implemented by means of three algorithms:

      The Mapping Algorithm, which is used to build conversion
tables, is straightforward.  Two tables are needed, one for the
mapping p->r, the Compression Table, and one for the mapping r->q,
the Expansion Table. Each table has R entries, indexed by r.  The
entry in each for p=...