Browse Prior Art Database

Construction an Analysis of a Starter Prototype Set for Automatic Handwriting Recognition

IP.com Disclosure Number: IPCOM000106869D
Original Publication Date: 1993-Dec-01
Included in the Prior Art Database: 2005-Mar-21
Document File: 2 page(s) / 102K

Publishing Venue

IBM

Related People

Bellegarda, EJ: AUTHOR [+4]

Abstract

Handwriting recognition algorithm based on elastic matching require the generation of a prototype set containing adequate templates. For writer-dependent processing, this set is generated during training. For writer-independent recognition, this set must be generated a priori from a variety of samples of human writings. This disclosure presents a systematic method for collecting and analyzing wide-coverage writing styles in order to establish a starter prototype set from samples of 82 roman-alphabet characters. This starter set is also useful for writer-dependent applications since it can be used to significantly speed up training.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Construction an Analysis of a Starter Prototype Set for Automatic Handwriting Recognition

      Handwriting recognition algorithm based on elastic matching
require the generation of a prototype set containing adequate
templates.  For writer-dependent processing, this set is generated
during training.  For writer-independent recognition, this set must
be generated a priori from a variety of samples of human writings.
This disclosure presents a systematic method for collecting and
analyzing wide-coverage writing styles in order to establish a
starter prototype set from samples of 82 roman-alphabet characters.
This starter set is also useful for writer-dependent applications
since it can be used to significantly speed up training.

      A salient feature of hand-writing recognition algorithms based
on elastic matching is the use of character prototypes as templates
against which to compare sample characters during decoding.  For
writer-dependent processing, the set of character prototypes is
generated from each particular writing during a training procedure so
as to tailor the system to the user [1,2].  For writer-independent
recognition algorithms, however, the set of character types must be
derived a priori from data collected from a large variety of writers.
The goodness of the resulting character prototype set is crucial to
the performance of elastic matching algorithms.  Such a set can also
be useful for writer-dependent algorithms as it can drastically
reduce the training period and thereby allow the suer to perform
recognition more readily.  In the past, starter set have been created
by (i) collecting different writing styles and converting them to
prototypes and/or (ii) manually deciding which samples are
representative of human handwriting [3].

      In this disclosure, a systematic method for collection and
analysis of a large variety of writing sytles is presented.

      For each character, several classes of different writing styles
are generated using the following algorithm:

1.  Extract the character of interest from all the available data
    collected on N writers.

2.  Count the total number of occurrences C of the character of
    interest.

3.  Cluster the data gathered for this particular character using the
    Paper-Like Interface prototype manager [3,4].  The clustering
    algorithm is based on Euclidean distance.  This results in M
    distinct classes, each containing C sub i occurrences of the
    character.

4.  Determine the percentage of original characters which contributed
    to a particular class; i.e., the ratio C sub i/C for i = 1, ...,
    M.

5.  Discard all cl...