Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Kanji Recognition and Reconstruction in Postal Address

IP.com Disclosure Number: IPCOM000113225D
Original Publication Date: 1994-Jul-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 2 page(s) / 100K

Publishing Venue

IBM

Related People

Rosenbaum, W: AUTHOR

Abstract

Rising worldwide mail volume and the onset of postal deregulation have forced large investments in mail automation. Central to the mail automation process of replacing human intervention with mechanical handling, has been the introduction of Optical Character Recognition (OCR) and code desk frontends for capturing the address information and encoding it in a manner that allows subsequent mechanical separation (sorting) of the mail pieces to any desired delivery specification. Any above automation steps however require accurate interpretation of all the information in the envelope's or parcel's address box.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Kanji Recognition and Reconstruction in Postal Address

      Rising worldwide mail volume and the onset of postal
deregulation have forced large investments in mail automation.
Central to the mail automation process of replacing human
intervention with mechanical handling, has been the introduction of
Optical Character Recognition (OCR) and code desk frontends for
capturing the address information and encoding it in a manner that
allows subsequent mechanical separation (sorting) of the mail pieces
to any desired delivery specification.  Any above automation steps
however require accurate interpretation of all the information in the
envelope's or parcel's address box.

      Although the acquisition of address information can be done via
operators keying at code desks, the major gains in operational
economy are achieved by full automation of data acquisition resulting
from OCR processing.  To date OCR Mail Sorter automation has been
pervasive in Western countries where Roman and Arabic (alphas and
numerics - respectively) alphabets predominate.

      Kanji, unlike Western languages, is constructed a unique
character shapes - nominally one shape per word.  Unlike Western
languages, there is no closed set of alphabetic character like A
through Z.  Kanji has about 64 thousand character / word definitions
of which about 13 thousand are commonly used.  Kanji characters are
composites of strokes which create different words via slight
nuances.  The preceding characteristics of Kanji - open ended
character set and minute differentiators - make it not amenable to
traditional OCR methodology.  Special Kanji algorithms have been
developed for example by Industrial Technology Institute of Taiwan.
All the algorithms have in common that they yield "best guess"
recognitions.

      For textual Kanji, algorithms exist to resolve the ambiguity of
consecutive, multiple Kanji OCR recognition outputs using grammatical
rules.  This provides the possibility of usable OCR text readers but
does not solve the OCR Mail Sorter problem.

      It is the purpose of the present system to provide an advance
in the state of the art of OCR and partially Kanji OCR applied to
mail sorter operation.

      A Kanji addressed mail piece is read through a scanner device
using mail transport.  The scanner device is a high resolution
instrument providing at least 200 data per inch resolution across the
face of the envelope.  Now is available the following Kanji character
forms in digital form.

      Since an addressed can be inscribed in Kanji either
horizontally or vertically, it is first proceeded with independently
OCRing the Kanji characters assuming a vertical and then a horizontal
orientation.  The respective outputs from the horizontal and vertical
OCR passes are separately output to temporary storage areas.

      It is then sought to apply...