Compression and expansion of integers using piece-wise linear conversion.
Publication Date: 2010-Sep-02
The IP.com Prior Art Database
Disclosed is a method of compressing integers, such as the sizes of objects in a Java system, into a smaller number of bits while maintaining precision for small numbers, reducing precision for less frequently encountered larger numbers, and maintaining the sorting order of the compressed values.
Compression and expansion of integers using piece -wise linear conversion.
Storing the sizes of objects is a common task in computer applications and virtual machines. Large numbers of objects (>100,000,000) are quite common in modern systems. The object sizes to be represented can vary from 0 or 1 or a handful of bytes to 8GB or more. Storing the size of an object as a 64-bit integer takes 8 bytes, so 800MB would be consumed in object sizes alone for 100,000,000 objects. There is a need for a way of storing the sizes of objects in less space.
Storing the size in 32-bits takes 4 bytes. This would be a useful reduction in size, but doesn't have the range required for modern Java* systems, where an array double d = new double[Integer.MAX
occupies around 16GB of memory, as a double occupies 8 bytes and Integer.MAX
VALUE is 0x7FFFFFFF.
It is possible to encode larger sizes in 32-bits. For example, IEEE-754 specifies how floating point numbers can be encoded in 32-bits. This has the disadvantage that only integers from 0 to 16777215 can be represented exactly, and larger sizes are only represented approximately.
Existing code, such as for the Eclipse Memory Analyzer
stores sizes in files on disk as 32-bit integers. It would be advantageous to have a new scheme which maintains compatibility for existing sizes of positive 32-bit integers, while expanding the range so that the size of any Java object can be represented with acceptable accuracy.
A possible simple scheme is to treat the integer as an unsigned number. This merely expands the range to 4GB, which is insufficient.
Another scheme would be to scale numbers great than 0x80000000 by -8, converting them to negative integers. This gives dual encoding of certain numbers, wasting code points.
In telecoms the A-law and µ-Law companding encodes numbers with greater range and less precision using a chord (piece-wise linear) encoding. http://en.wikipedia.org/wiki/%CE%9C-law
ITU-T Recommendation G.711 http://www.itu.int/rec/dologin
This is only specified for 8-bit
Here is disclosed is a method of storing a size in a int (32 bits) with some loss of precision for large numbers.
Integers in the range 0x0 to 0x7FFFFFFF represent sizes from 0x0 to 0x7FFFFFFF, exactly as with existing implementations with simple 32-bit sizes.
Larger numbers are represented with less precision. This is done by changing the step size in the expanded value when moving from one compressed value to the next in the conversion of compressed to original value at certain break points. Once a certain break point is reached the step size is increased.
Compression is just the reverse process of expansion.
For each range, another 3 bits are used on the left of the number to show the scale, 3 fewer bits are used on the right to represent the numbers, and the step size goes up by another 6 bits, so that the range for each...