Browse Prior Art Database

Enhanced Alpha Content Prescan for Automatic Spelling Error Correction and Related Tasks

IP.com Disclosure Number: IPCOM000015687D
Original Publication Date: 2002-Aug-16
Included in the Prior Art Database: 2003-Jun-20

Publishing Venue

IBM

Abstract

Disclosed is a method that improves the speed of looking up words in a lexicon in the presence of errors. This method is an enhancement of US patent 4355371, which describes a method to speed up automatic spelling correction. That technique calculates the difference in character presence (the "alpha content") between the checked word and the lexicon candidate. If the difference is low enough, a detailed match of the two words is made. Testing alpha content is achieved by comparing bitmaps of the character presence of two words—an operation that is much faster than a detailed word comparison. The disclosed invention further improves the speed of the spelling correction by performing the following actions: Assigning a normalized bitmap-based score to the candidate words not rejected. Sorting the candidates by that score. Using a modified detailed matching algorithm that aborts a match once it is clear that the match score cannot be higher than a score already achieved for another candidate.Stopping the search when the next candidate's bitmap score is significantly less than the best detailed match score. These additional steps reduce both the number of detailed string matches that must be made, and the average duration of each match. The method is used successfully for the automatic recognition of handwritten words, where there are many OCR errors. The Solution