Browse Prior Art Database

Enhanced Alpha Content Prescan for Automatic Spelling Error Correction and Related Tasks

IP.com Disclosure Number: IPCOM000015687D
Original Publication Date: 2002-Aug-16
Included in the Prior Art Database: 2003-Jun-20
Document File: 2 page(s) / 46K

Publishing Venue

IBM

Abstract

Disclosed is a method that improves the speed of looking up words in a lexicon in the presence of errors. This method is an enhancement of US patent 4355371, which describes a method to speed up automatic spelling correction. That technique calculates the difference in character presence (the "alpha content") between the checked word and the lexicon candidate. If the difference is low enough, a detailed match of the two words is made. Testing alpha content is achieved by comparing bitmaps of the character presence of two words—an operation that is much faster than a detailed word comparison. The disclosed invention further improves the speed of the spelling correction by performing the following actions: Assigning a normalized bitmap-based score to the candidate words not rejected. Sorting the candidates by that score. Using a modified detailed matching algorithm that aborts a match once it is clear that the match score cannot be higher than a score already achieved for another candidate.Stopping the search when the next candidate's bitmap score is significantly less than the best detailed match score. These additional steps reduce both the number of detailed string matches that must be made, and the average duration of each match. The method is used successfully for the automatic recognition of handwritten words, where there are many OCR errors. The Solution

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 2

  Enhanced Alpha Content Prescan for Automatic Spelling Error Correction and Related Tasks

Disclosed is a method that improves the speed of looking up words in a lexicon in the presence of errors. This method is an enhancement of US patent 4355371, which describes a method to speed up automatic spelling correction. That technique calculates the difference in character presence (the "alpha content") between the checked word and the lexicon candidate. If the difference is low enough, a detailed match of the two words is made. Testing alpha content is achieved by comparing bitmaps of the character presence of two words-an operation that is much faster than a detailed word comparison.

    The disclosed invention further improves the speed of the spelling correction by performing the following actions:

    Assigning a normalized bitmap-based score to the candidate words not rejected. Sorting the candidates by that score. Using a modified detailed matching algorithm that aborts a match once it is clear that the match score cannot be higher than a score already achieved for another candidate.Stopping the search when the next candidate's bitmap score is significantly less than the best detailed match score. These additional steps reduce both the number of detailed string matches that must be made, and the average duration of each match. The method is used successfully for the automatic recognition of handwritten words, where there are many OCR errors.

The Solution

    The method adheres to the following process: For each string in the lexicon, compute a bitmap that is that string's signature. US4355371 uses an 8-bit bitmap, but with current memory availability and cost, 32-bit bitmaps are preferable. Other sizes may be used as well. To compute the bitmap, an application-dependent mapping from the character set to the bits is used, and for each character in the string, and only those characters, the appropriate bit is set. [The previous sentence is very unclear!] This step is performed only once per lexicon.
1. Compute the bitmap for the string to be correc...