Browse Prior Art Database

Segmentation Procedure for Handwritten Symbols and Words

IP.com Disclosure Number: IPCOM000050949D
Original Publication Date: 1982-Dec-01
Included in the Prior Art Database: 2005-Feb-10
Document File: 3 page(s) / 51K

Publishing Venue

IBM

Related People

Kurtzberg, JM: AUTHOR [+2]

Abstract

In the recognition of handwritten symbols, it is necessary to have an efficient procedure for the segmentation of multistroke symbols into isolated symbols. Recognition, display or various data handling methods can then operate on the resulting isolated symbols. Similarly, for cursive writing it is necessary to segment the writing into words. An efficient and accurate method for performing the segmentation task is presented herein.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 50% of the total text.

Page 1 of 3

Segmentation Procedure for Handwritten Symbols and Words

In the recognition of handwritten symbols, it is necessary to have an efficient procedure for the segmentation of multistroke symbols into isolated symbols. Recognition, display or various data handling methods can then operate on the resulting isolated symbols. Similarly, for cursive writing it is necessary to segment the writing into words. An efficient and accurate method for performing the segmentation task is presented herein.

It is assumed that the transducing tablet yields (x,y) coordinate and pen-down data as a function of time. A stroke is taken to be a sequence of coordinate points from one pen-down to the next. Data is only transmitted while the pen is in contact with the tablet. A segment is to be composed of one or several strokes.

Available existing techniques perform segmentation based on separation of x-axis projection of stroke data. Such methods require that the writer take great care in separating symbols. Often, as is illustrated in Fig. 1, the x-axis projection of different symbols overlap so that this method is not adequate. In order to avoid the segmentation problem, it has been advocated (and even used) that the writer use separate, distinct predefined boxes.

Also, contrary to that assumed by most segmentation techniques, a character is not necessarily composed of successive strokes. For example, there may be "delayed" crossings of t's, as shown in Fig. 1, or other delayed strokes.

The present procedure efficiently solves both these problems. It solves the inadequacy of the x-axis projection technique by using a two dimensional separation test, and the delayed stroke problem by allowing a stroke to be attached to any partially formed segment, not just the most recent.

The procedure is first given in broad outline and then described in detail with reference to the flowcharts given in Figs. 2, 3 and 4. The procedure groups the strokes of a line of input data into segments. The table shows the desired segmentation output for the line shown in Fig. 1. The number of strokes and their corresponding strokes are listed for each segment. Note that the array number of strokes is one dimensional a function of the segment number. However, the array strokes are two-dimensional a function of the segment number and the stroke number within the segment.

The technique operates in two phases. In the first, each stroke is tested for possible attachment to existing segments by means of distance criteria based upon nearness of a stroke to a segment. If the criteria are not met, a new segment is initiated.

The first phase is made efficient by processing the strokes in the order written since a multistroke symbol most likely has adjacent strokes attached. Further efficiency is obtained by first checking if stroke endpoints attach, since attachment at these points is most likely, thereby obviating the exhaustive checking of one point against another. (Dots are handled by che...