Browse Prior Art Database

Universal QIC-122 Codec with an Output Cache

IP.com Disclosure Number: IPCOM000108572D
Original Publication Date: 1992-Jun-01
Included in the Prior Art Database: 2005-Mar-22
Document File: 3 page(s) / 158K

Publishing Venue

IBM

Related People

Medan, Y: AUTHOR

Abstract

Disclosed is a Universal QIC-122 Codec with an output cache. The cache contains references to previously encoded strings. These references are the encoder output and, hence, the term 'output cache'. Addition of an output cache mechanism to a Universal QIC-122 Codec enhances its compression efficiency by encoding frequently encountered strings more compactly. In addition, better compression of short files is achieved by initializing the cache with references to a default dictionary.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 41% of the total text.

Universal QIC-122 Codec with an Output Cache

       Disclosed is a Universal QIC-122 Codec with an output
cache.  The cache contains references to previously encoded strings.
These references are the encoder output and, hence, the term 'output
cache'.  Addition of an output cache mechanism to a Universal QIC-122
Codec enhances its compression efficiency by encoding frequently
encountered strings more compactly. In addition, better compression
of short files is achieved by initializing the cache with references
to a default dictionary.

      This is a hybrid scheme combining LZ-1 (Lempel-Ziv)-1 Type and
LZ-2 Type algorithms into a single, unified framework to provide a
significant coding gain over each one of the individual techniques.
The size of the cache can be varied according to the specific type of
data.  Different strategies may be used to refresh the content of the
proposed output cache.  Background information on QIC-122 is given in
the Appendix.

      For certain types and size of information objects, LZ-2 Type
compression schemes outperform LZ-1 Type and vice versa.  LZ-1 Type
algorithms are extremely useful for short information sources and/or
sources that have a rather short context.  LZ-2 Type algorithms on
the other hand perform better for large data with a stationary
distribution of the language 'words' or 'phrases'.  In this latter
case, the encoding of a 'phrase' by a ('offset','runlength') pair is
much less efficient than a short index to a dictionary.  But the
amount of data needed to build up a useful dictionary is quite large
and, therefore, the breakeven point between LZ-1 and LZ-2 is reached
only for moderate to large file sizes. This precludes the use of LZ-2
Type schemes for random access files since for each page to be
retrieved, the whole file needs to be decoded, whereas with LZ-1
individual pages can be encoded and decoded independently with
reasonable compression gains.

      To overcome the deficiencies of both approaches it is proposed
to use a novel hybrid scheme that combines both encoding techniques
under a unified framework - a Universal QIC-122 Codec with an output
cache.  In this scheme, the RAW_BYTE field of the QIC-122 standard is
used as an index to a small dictionary that collects references to
encoded strings ('phrases').  Thus, each 'phrase' can be encoded
using either an entry from the dictionary or, in case that such an
entry does not exist, an indirect reference to the history of the
input source using the LZ-1 Type ('offset','runlength') pair.

      The hybrid approach cannot perform any worse than each of the
individual approaches.  On the contrary, whenever the dictionary
contains an entry that matches a give 'phrase', it is being encoded
more efficiently than LZ-1 Type encoding format.  If, on the other
hand, the dictionary does not contain a reference to the 'phrase',
LZ-1 Type encoding is much more efficient than the encoding procedure
that is used by L...