Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Robust Database Searching with Parallel Computer

IP.com Disclosure Number: IPCOM000105694D
Original Publication Date: 1993-Sep-01
Included in the Prior Art Database: 2005-Mar-20
Document File: 4 page(s) / 111K

Publishing Venue

IBM

Related People

Camp Jr, WO: AUTHOR

Abstract

Disclosed is a method for searching a limited lexicon for the best match to a candidate word when the candidate word has missing letters, extra letters, and wrong letters. The procedure is applicable to any form of lexicon with a finite set of symbols; it is not constrained to words and the alphabet. This invention joins the methods of dynamic programming, neural networks, the maze solving algorithm, and cellular automata with parallel processing computers to solve this problem.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 47% of the total text.

Robust Database Searching with Parallel Computer

      Disclosed is a method for searching a limited lexicon for the
best match to a candidate word when the candidate word has missing
letters, extra letters, and wrong letters.  The procedure is
applicable to any form of lexicon with a finite set of symbols; it is
not constrained to words and the alphabet.  This invention joins the
methods of dynamic programming, neural networks, the maze solving
algorithm, and cellular automata with parallel processing computers
to solve this problem.

      One starts with a candidate word and a list of words (lexicon)
that the candidate word is to be compared to for the best match.  For
example:

      CANDIDATE WORD:   RMER        LEXICON:          APPLE
                                                      ARCHER
                                                      BORDER
                                                       ...

      One then builds a comparison matrix between each lexicon word
and the candidate word.

            R  M  E  R        R  M  E  R        R  M  E  R
      A                   A                 B
      P                   R                 O
      P                   C                 R                 ...
      L                   H                 D
      E                   E                 E
                          R                 R

      These matrices are filled with values from a static confusion
matrix which defines the probability of confusing one letter for
another.

            A    B    C    D    E    F    ...
      A     1  .2  .1  .1  .2  .3   ...
      B     .2 1   .3  .6  .5  .3   ...
      C     .1 .2  1   .2  .5  .3   ...
      ...           ...             ...

      A key aspect of this technique is that we can construct this
matrix quickly and easily by using a neural network trained on the
images of the alphabet and measuring the hamming distance between the
hidden nodes for the classes represented by each character.

The filled-in matrices from above might look like:

      R   M   E   R        R   M   E   R        R   M   E   R
A    .8  .3  .1  .8   A   .8  .3  .1  .8   B   .5   0  .3  .5
P    .5  .1  .3  .5   R   1    0  .1  1    O    0   0  .1  .1
P    .5  .1  .3  .5   C   .1   0  .3  .1   R   1    0  .1  1 ...