Improved Utilization of Multiple Codebooks in Automatic Handwriting Recognition
Original Publication Date: 1992-Jun-01
Included in the Prior Art Database: 2005-Mar-22
Publishing Venue
IBM
Related People
Bellegarda, EJ: AUTHOR [+4]
Abstract
This article studies the importance and relative contributions of selected feature parameters in shaping the recognition rate of an automatic handwriting recognizer. It has been determined that the second order information (curvature) does not contribute as much as the zeroth (position) and first order (angle) information. In addition, multiple codebooks operating on low dimensional spaces achieve better results than a single codebook operating on a high-dimensional space. This results in a significant gain in computation time.
Improved Utilization of Multiple Codebooks in Automatic Handwriting Recognition
This article
studies the importance and relative
contributions of selected feature parameters in shaping the
recognition rate of an automatic handwriting recognizer. It has been
determined that the second order information (curvature) does not
contribute as much as the zeroth (position) and first order (angle)
information. In addition, multiple
codebooks operating on low
dimensional spaces achieve better results than a single codebook
operating on a high-dimensional space.
This results in a significant
gain in computation time.
The problem
of automatic recognition of handwritten text
produced in either discrete, runon, cursive, or unconstrained mode is
addressed. The choice of feature
parameters is one of the most
important steps in the design of a handwriting recognizer. It
requires a trade- off between completeness in characterizing
handwriting and the number of feature parameters one is able to
handle efficiently.
There have
been previous attempts to select a set of feature
parameters in such a way as to characterize handwriting produced in
any type of handwriting mode. There is
created, at each equispaced
point in the pen trajectory, a feature vector of size 6 encompassing
(i) the horizontal and vertical incremental changes; (ii) the sine
and cosine of the angle of the tangent to the pen trajectory at this
given point; and (iii) the incremental changes in sine and cosine.
Subsequently, (2H+1) frames (or feature vectors) were concatenated
(spliced) to form one big vector of dimension 6(2H+1). Corresponding
eigenvalues were computed based on the total covariance matrix and
rotation/projection was subsequently performed to eliminate
redundancy in the data.
This approach
resulted in a codebook derived from 6-D feature
vectors: we shall call it losely the 6-D codebook. The goal is to
separate out this 6-D codebook into multiple codebooks to test a
hypothesis regarding the usefulness of the curvature in the feature
vector. The procedure consists in (i)
studying a 4-D feature vector
made of position and angle information solely; (ii) splitting the
original 6-D feature vector into three 2-D vectors (position, angle,
curvature) each assigned to different codebook; (iii) varying the
weights of each of the three codebooks in (ii) (to determine the
influence of one codebook over the other).
The results
of the experiments were obtained by taking a value
of H=20.
The experiments were run on an 81-character vocabular...