A Method for Japanese Name Post-Processing


OCR Post-Processing for Japanese Name

[Technical Area]

Disclosed is a method that improves OCR results for Japanese person name.


Form OCR is raising recognition accuracy by the help of knowledge
post-processing in the fields, such as a name and the address. There is a
custom to fill in a Japanese name. It is always entered by two kinds of
characters, an ideograph character kanji and a phonogram character
katakana. A kanji name usually has more than one reading, so the katakana
name specifies its reading.

OCR post-processing to a Japanese name (includes katakana name processing
and kanji name processing) can raise accuracy by using this custom.
Conventionally, kanji name processing utilizes the result of katakana name
processing, since the recognition accuracy of katakana character is
comparatively high. It is so called katakana-base method. This disclosure
uses and combines kanji-base method and single-kanji-base method, both of
which are newly devised, in addition to traditional katakana-base method.
As a result, the process accuracy for Japanese name can be raised by leaps
and bounds. (Fig. 1)

katakana lattice

(OCR result) katakana name result

katakana-nake candidate kanji name candidate katakana name feedback

 kanji lattice (OCR result) kanji name result

: process flow : data flow




katakana name processing

   kanji name processing - katakana-base - kanji-base - single-kanji-base

Fig.1 Data flow and module configuration

Katakana-base method is shown in Fig. 2. First, in katakana name
post-processing, OCR recognition result lattice is used for output probable
Katakana names. Next, list Kanji name, which corresponds with the Katakana,
is made to compare with the kanji lattice. And it is resorted and output as
kanji name processing result.


kanji lattice

katakana lattice

                                  み本 蹟史 ヤユ モ ト ヒロ フ ミ 山床 博廷 カマ マ ニ ゼ ノニ れ 匹 嬉 建 タニキ エセ ク め庄 鶴選 サー ネ モ タス 谷座 構達

ヨユ セ モ ン ぬ 木 障 災 シニ ー カ わ芥 跨笑 モ マ ム右 偉箕

洩確 靖穴 片左 虜笠

 Candidate of kanji nam e prospected from katakana nam e

ヤマモト 山元 山本 カマモト 山本 谷本 タニモト 釜元 釜本 :釜本 山元

谷元 resorting 谷元 谷本 釜元

Result of kanji nam e post- processing

Two new methods disclosed in this disclosure are approaching from kanji

[kanji-base method]

Kanji-base method shown in Fig. 3 can be said that processes in opposite
direction of kana-base method...First, in kanji name post-processing,
OCR recognition result lattice is used for output probable Kanji names.
Next, candidate of kana name, which corresponds with the kanji name
result, is made to compare with the kana lattice. If it get high