Browse Prior Art Database

Treating Low Frequency Dictionary Words As Misspellings to Increase Correction Accuracy

IP.com Disclosure Number: IPCOM000034832D
Original Publication Date: 1989-Apr-01
Included in the Prior Art Database: 2005-Jan-27
Document File: 1 page(s) / 12K

Publishing Venue

IBM

Related People

Damerau, FJ: AUTHOR [+2]

Abstract

Disclosed is an improvement to spelling error detection and correction programs which treat real but rare words occurring in the spelling dictionary in a manner similar to misspelled words. This permits the use of very large dictionaries without losing the ability to detect probable errors by confusing a misspelling with a legitimate but rare word. Spelling correctors currently in use commonly look up input words in a dictionary and flag as misspelled any word not found in the dictionary list. Some of these dictionaries are very large, as much as 50,000 words, which means that some of the words contained in them are quite rare in actual usage. It is intuitively clear that as the size of a dictionary increases, the likelihood that an incorrectly spelled word will match the spelling of a correct word will rise.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 1

Treating Low Frequency Dictionary Words As Misspellings to Increase Correction Accuracy

Disclosed is an improvement to spelling error detection and correction programs which treat real but rare words occurring in the spelling dictionary in a manner similar to misspelled words. This permits the use of very large dictionaries without losing the ability to detect probable errors by confusing a misspelling with a legitimate but rare word. Spelling correctors currently in use commonly look up input words in a dictionary and flag as misspelled any word not found in the dictionary list. Some of these dictionaries are very large, as much as 50,000 words, which means that some of the words contained in them are quite rare in actual usage. It is intuitively clear that as the size of a dictionary increases, the likelihood that an incorrectly spelled word will match the spelling of a correct word will rise. At least one study [1] has argued that this fact implies that spelling dictionaries should be small, rather than large. An empirical study of spelling errors [2] argues that this inference is not correct. Nevertheless, such confusions of misspellings with real words do occur. As shown in
[2], when the dictionary was increased in size from 50,000 to 60,000 words, approximately 20 errors in the text sample would have been missed by a spelling corrector. The solution to this problem is to change from a two-way classification to a three-way classification: correct, incorrect,...