HROP - A fast method for distinguishing between hand-print and machine-print text
Original Publication Date: 2002-Jul-01
Included in the Prior Art Database: 2003-Jun-11
HROP A fast method for distinguishing between hand-print and machine-print text Many computerized tasks deal with analyzing written material for different purposes such as text scanning, postal sorting systems and more. All systems used for analyzing written material use some kind of OCR (Optical Character Recognition) engine. For improving OCR performance different OCR engines are used for machine print (MP) and hand print (HP) material. However this causes some difficulty in applications where the input to the OCR engine may include both MP and HP text. In order to decide which of the OCR engines (MP or HP oriented) to activate for each input text, one must know, before activating the OCR engine, if hand or machine print text is involved. Thus a preliminary step for distinguishing between hand and machine print, prior to the OCR stage, is a necessity for obtaining good results. Furthermore when grouping segments into blocks, prior to activating the OCR engine information concerning text type (HP or MP) may be of vital importance. The presented method offers an efficient and fast way for characterizing input text as MP or HP without using the OCR engines. Many image understanding tasks, which are performed with great ease by the human eye, become a great problem when automatic computers are used for the same tasks. An example of such a task is distinguishing between HP and MP text. This task is performed with great ease by any untrained person, but includes a lot of problems when an automatic algorithm attempts to do the same. As described previously there is a need to distinguish between MP and HP text in tasks which include activating OCR engines. To automatically do so we must characterize global differences between hand and machine print. We must use these differences to form robust features which will be invariant to font size, type, and writer personality. As a first step let us observe MP and HP text and try to characterize them in terms of line and space widths. HP text is usually written with a pen, that generates a typical line width which is modulated by the pressure induced on the pen. Letter and word spacing have a significant standard- deviation, which is a result of the “human factor” as well as the fact that the hand written letters are usually rounded and do not include many parallel lines. The widths of both hand written lines and spaces are correlated to font size.