Browse Prior Art Database

Spelling Correction with Keyboard, User, and Language Models

IP.com Disclosure Number: IPCOM000104425D
Original Publication Date: 1993-Apr-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 6 page(s) / 324K

Publishing Venue

IBM

Related People

Brown, PF: AUTHOR [+6]

Abstract

Disclosed is a system for the automatic correction of spelling errors in typed text. This system has the following desirable characteristics: It does not assume that words in its input text which happen to be in its language's lexicon are correctly spelled. For example, it might correct "I went too the store" as "I went to the store", even though "too" is an English word. It accepts original text and produces a single candidate for the intended text which is best based on models of keyboard layout and usage, human fallibility, and the language in which the text is written.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 16% of the total text.

Spelling Correction with Keyboard, User, and Language Models

      Disclosed  is  a  system  for  the  automatic  correction of
spelling errors in typed text.  This system has the following
desirable characteristics:  It does not assume that words in its
input text which happen to be in its language's  lexicon are
correctly spelled.  For example, it might correct "I went too  the
store" as "I went to the store", even though "too" is an English
word.  It accepts original text and produces a single candidate for
the intended text which is  best  based on  models  of keyboard
layout and usage, human fallibility, and the language in which the
text is written.

      The original and proposed texts may differ by arbitrary
insertion,  substitution,  deletion,  or  transposition   of letters.
Thus,  it is not assumed that the initial segments of words are less
likely to be  mistyped.  Furthermore,  the underlying method may be
extended to handle the insertion or deletion  of  spaces  or
punctuation  as well, allowing the correction of "I wentto the store"
to "I went to the store", for example.    It  can  maintain  ranked
lists  of  likely alternatives at each point in the input text to aid
revision of  its output, should such be necessary.  All of the models
used to determine the best candidate are trainable from real language
data or from actual  samples  of  typed  text  with misspellings, as
approriate.

      More specifically, the goal of this spelling correction system
is  to  output  the  intended sequence of words when given as input a
sequence  of  characters  or,  preferably, keystrokes.    A
keystroke  is  a  character annotated with information about how it
was typed: when,  what  combination of  keys  on  a  keyboard  was
used,  how  long the key was depressed, etc. Retaining as much
information  as  possible about  the  typing  process  enhances
correction performance when this  extra  information  is  accurately
modeled.  For example,   keystrokes   widely  separated  in  time
may  be indicative of a deliberate typist who is checking that  each
key is the intended one before hitting it.  Such a typist may well
make spelling errors, but these errors are unlikely to be caused by
mistakenly hitting one key  for  another.  That is,  the  keyboard
model  would  assign  low probability to substitution when a
keystroke is distant in  time  from  its neighbors.  On the other
hand, keystrokes extremely close in time  are  more  likely  to  be
transposed   than   others.  Keystrokes  are  more  likely to be
dropped when a person is typing  quickly.  All   of   these
observations   can   be incorporated  in the keyboard model.  As a
practical example, these  factors  will  help  the correction system
when it is faced with the input spelling "wate" and must decide
whether the typist omitted an "r" or jus...