Browse Prior Art Database

Coding System Involving Frequencies of Letters and Letter Pairs

IP.com Disclosure Number: IPCOM000096088D
Original Publication Date: 1964-Dec-01
Included in the Prior Art Database: 2005-Mar-07
Document File: 4 page(s) / 63K

Publishing Venue

IBM

Related People

Chien, RT: AUTHOR [+3]

Abstract

In a redundant coding system, it is advantageous to take into account frequencies of occurrence of characters to achieve an error rate substantially lower than that obtainable when all characters are assumed to occur equally.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 48% of the total text.

Page 1 of 4

Coding System Involving Frequencies of Letters and Letter Pairs

In a redundant coding system, it is advantageous to take into account frequencies of occurrence of characters to achieve an error rate substantially lower than that obtainable when all characters are assumed to occur equally.

The first arrangement A is an easily implemented variation of a basic group code error correction system. Therefore, frequency-dependent coding is not significantly more complex than simple error-correcting coding. It is advisable to first consider the code and note that the letters of the alphabet, plus a symbol for SPACE, are to be represented by a seven-bit code. In the following Table I, it is seen how the actual 7-tuples are assigned to the twenty-seven characters:

(Image Omitted)

In Table I, the fourteen characters most likely to occur are assigned to fourteen of the sixteen code combinations in a seven-bit single-error correcting Hamming code. The characters are numbered higher in a decreasing order of frequency. The thirteen least likely characters are assigned to the two remaining code combinations, 0000000 and 1111111, and to eleven other of the 7-tuples which correspond to single-errors of these two code combinations, i.e., a single 1 in the first combination and a single 0 in the second combination. The code words of the Hamming protected combinations have either three or four 1 bits.

In A, the decoder delivers the received sequence to a Hamming decoder, a diode matrix decoder, and a weight-sensitive gate. The weight of a code word is determined by the number of 1 bits it presents. The output of the Hamming decoder feeds a diode matrix decoder whose output is enabled if the weight of the received sequence is greater than 1.5 and less than 5.5. Similarly, if the weight of the received code word is 0, 1, 6 or 7, the first-mentioned diode matrix decoder's output is enabled.

Based on written English, this system yields an average error rate of .73p where p is the probability of bit error in a memoryless binary symmetric channel The best level of performance that can be achieved for written English, if frequencies are not taken into account and a seven-bit code is used, is more than 3p. Hence, here there is more than a four-fold improvement in error rate.

Drawings B, C and D show another but related digram statistics coding system. This has an error correcting technique based on the statistical probability of the occurrence of certain letters in pairs. The principle of operation is that, when an error in a code word for a letter is detected, that code word is altered to represent the letter which statistically carries the greatest likelihood of following the preceding correct letter.

Although complete statistical data are available for various uses of the English language to provide such information, for illustrative purposes, condensed data together with required codes are shown in Tables II and III:

(Image Omitted)

1

Page 2 of 4

(Image O...