Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Sliding Window Cross Hatch Match Algorithm for Spelling Error Correction

IP.com Disclosure Number: IPCOM000053040D
Original Publication Date: 1981-Aug-01
Included in the Prior Art Database: 2005-Feb-12
Document File: 2 page(s) / 14K

Publishing Venue

IBM

Related People

Convis, DB: AUTHOR [+4]

Abstract

Disclosed is an algorithm for spelling error correction which utilizes a sliding window match to compare an input error word to a dictionary of correctly spelled words. The technique provides for a forward pass through the word and a backward pass which enables the system to recover where the input word contains added or dropped characters, an improvement over prior-art character-by-character comparison techniques.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Sliding Window Cross Hatch Match Algorithm for Spelling Error Correction

Disclosed is an algorithm for spelling error correction which utilizes a sliding window match to compare an input error word to a dictionary of correctly spelled words. The technique provides for a forward pass through the word and a backward pass which enables the system to recover where the input word contains added or dropped characters, an improvement over prior-art character- by-character comparison techniques.

The sliding window is positioned by a pointer to the current dictionary word character and a pointer to the current error word character. A two-character window is used which includes the current and next character in the dictionary and error words. This two-character window is selected to provide a quality of match which requires approximate position and sequence of the matched characters. The matching sequence is in the order of most likely error as follows:
1) No error / current characters match. 2) Substitution error / next characters match. 3) Dropped character / current error word character matches next dictionary word character. 4) Added character / next error word character matches current dictionary word character. The window may not stay square throughout the matching process on a pair of words. When matches 3 or 4 are taken, a skewing of the window occurs. This skew compensates for local or overall length adjusting errors, thereby realigning displaced characters.

The window adjusts for a one-character drop or add each match cycle. However, dropped or added syllables can cause sufficient misalignment such that no more matches are found. When this occurs, the words are processed from the last character of the word towards the first character until the window encounters the last match found when processing forward. Any new match(es) found in this partial backward pass are added to those found going forward for the total matched characters.

The length adjustment aspect of the window can cause misalignment when other types of errors look like dropped or added characters. For example, an OCR input device may break a single character into two characters or combine two characters into one. This misalignment is often realigned but can cause m...