Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Text File Compression for Computer Programs

IP.com Disclosure Number: IPCOM000112705D
Original Publication Date: 1994-Jun-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 2 page(s) / 49K

Publishing Venue

IBM

Related People

Swingle, P: AUTHOR

Abstract

Described is a software implementation to provide a text file compression method for computer programs using a relatively small amount of computation. The technique uses a series of transforms to compress a text file.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 59% of the total text.

Text File Compression for Computer Programs

      Described is a software implementation to provide a text file
compression method for computer programs using a relatively small
amount of computation.  The technique uses a series of transforms to
compress a text file.

      The implementation uses a fixed list of common words and
encodes these as single characters.  It combines word encoding with
other commonly used techniques to provide an effective, efficient
compression method that requires only a small amount of computation.
The algorithm is used in reverse to reconstruct the original text.

      Generally, prior-art compression methods must be general so as
to operate on just about any kind of file.  This can cause a certain
amount of inefficiency.

      The technique described herein for compressing a text file uses
a series of transforms.  First, the method compresses any repeated
string of more than three characters into an escape code, a count,
and the repeated character itself.  The basic flow is as follows:

      normal text --> runs of the character --> common words -->

        Huffman coding --> compressed text

      The next step is a scan for common words.  Since about fifty
words account for approximately fifty percent of the words in normal
documents, a substantial amount of compression can be attained by
encoding common words as compared to encoding single characters.
This is dependent on which language the text is...