Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Multi-font Recognition Method Using a Layered Template Dictionary

IP.com Disclosure Number: IPCOM000107349D
Original Publication Date: 1992-Feb-01
Included in the Prior Art Database: 2005-Mar-21
Document File: 5 page(s) / 237K

Publishing Venue

IBM

Related People

Katoh, S: AUTHOR [+2]

Abstract

Disclosed is a character recognition device that has a layered template dictionary. The system automatically selects those records in a template data set that are effective for reading specific fonts, and swaps effective and ineffective records in order to improve the recognition accuracy, without requiring the judgement of expert operators. Template Layer for Font Variation

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 29% of the total text.

Multi-font Recognition Method Using a Layered Template Dictionary

       Disclosed is a character recognition device that has a
layered template dictionary. The system automatically selects those
records in a template data set that are effective for reading
specific fonts, and swaps effective and ineffective records in order
to improve the recognition accuracy, without requiring the judgement
of expert operators.
Template Layer for Font Variation

      To realize a multi-font Kanji (Chinese character) recognition
capability in an optical character reader (OCR), it is necessary to
establish a stable method of extracting features of various fonts and
to establish a highly accurate recognition algorithm.
State-of-the-art methods have sufficient recognition accuracy in
applications for which the font is controlled, but recognition
algorithms still have problems in handling uncontrolled or
unspecified fonts.

      Since there are many varieties of fonts in Japanese printed
documents, we have classified them into two major groups, "Mincho"
and "Gothic" and prepared average Mincho and Gothic template data
sets, which are generated by various Mincho and Gothic fonts in the
first preparatory step of the process. For special fonts that are not
classified as Mincho or Gothic, it is not easy either to determine
the probability of their appearing or to generate templates, since
they are rarely used in business documents, and their frequency of
appearance largely depends on the contents of documents.

      On the other hand, when we read a sheaf of documents by using
an OCR system, we can assume that the probability of some specific
font appearing is very high, because similar documents are usually
printed in the same fonts.

      Therefore, we define an additional template data set as a third
template dictionary, to which the user can add template records for
an unknown font style. The third template is independent of the
Mincho and Gothic templates. This template dictionary contains many
template records of various fonts that are not covered by the Mincho
and Gothic templates and is selected and loaded in the main storage
according to the font style of the document.

      Whenever a certain number of misread data is sampled after a
new document has been read, the additional dictionary is generated by
averaging their feature data. Although the dictionary is generated
after misread data have been detected, it is not necessary to prepare
many dictionaries for unknown font variations at the set-up stage.
Template Layer for Font Adaptation

      Generally speaking, since Gothic fonts are used for headers or
important words, it is easy to guess that the number of Gothic font
categories in a document is much smaller than that of Mincho font
categories. We can also guess that a Mincho template record may play
the role of a Gothic template record when a Gothic template of the
same category does not exist, if the Mincho and Goth...