Browse Prior Art Database

Spelling Verification Throughput Enhancement

IP.com Disclosure Number: IPCOM000039596D
Original Publication Date: 1987-Jul-01
Included in the Prior Art Database: 2005-Feb-01
Document File: 3 page(s) / 48K

Publishing Venue

IBM

Related People

Carlgren, RG: AUTHOR [+2]

Abstract

A method of increasing Spelling Verification throughput in a text processing system is described. In the described method a stem delta field is added to each skip field in order to keep stem unfolding synchronized to stem number. This facilitates subsequent auxiliary function lookup, i.e., hyphenation, part-of-speech, etc. In addition, both skip and stem delta fields are run-length-encoded and then optimally packed onto the data records of the dictionary. These enhancements allow a uniform decoding approach for both the main and high-frequency dictionaries as well as providing significant speed improvement. (Image Omitted) The dictionary is constructed by a "build" or "encoding" phase. The skip fields are constructed at this time and combined with the stems on logical data records.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

Spelling Verification Throughput Enhancement

A method of increasing Spelling Verification throughput in a text processing system is described. In the described method a stem delta field is added to each skip field in order to keep stem unfolding synchronized to stem number. This facilitates subsequent auxiliary function lookup, i.e., hyphenation, part-of-speech, etc. In addition, both skip and stem delta fields are run-length-encoded and then optimally packed onto the data records of the dictionary. These enhancements allow a uniform decoding approach for both the main and high-frequency dictionaries as well as providing significant speed improvement.

(Image Omitted)

The dictionary is constructed by a "build" or "encoding" phase. The skip fields are constructed at this time and combined with the stems on logical data records. During operation or "decoding" these records are unfolded looking for a match between the input word and the dictionary word. The variable field dictionary structure requires unfolding of each nibble (1/2 byte) of a data record. This constitutes a major part of the time spent in verification. This method eliminates 80% to 90% of the unfold time. The amount of unfolding saved is a function of the number of words in the dictionary. During the dictionary build process "skip counts" are computed. These counts delineate regions of each record that have "identical" first 3-character content, as shown in Fig. 2. Identical is qualified since the structure stores stem words in a sort order with certain character features suppressed. Words with upper-case characters are sorted as if all the characters were lower case and words with accented characters are sorted as if all the characters were unaccented. For example A ==> a and a-grave ==> a for sort sequence purposes. Each count points to the starting nibble of the first word of a group of words all having the "same" first 3 characters. The auxiliary functions require the stem number of a verified word in order to correctly access "shadow tables." However, skipping stems results in losing track of the current stem number. Thus, for auxiliary functions, verification cannot use...