Browse Prior Art Database

Multi-term Word Post-process with Word Attribute

IP.com Disclosure Number: IPCOM000113741D
Original Publication Date: 1994-Sep-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 4 page(s) / 94K

Publishing Venue

IBM

Related People

Kita, Y: AUTHOR

Abstract

Disclosed is the mechanism to improve the performance of post-processor for the multi-term word. The word in the dictionary has its own attribute(s) and the system defines the degree of the intimacy between each pair of the attributes. The post-processor selects a set of words that composes the result of input data by scoring candidates contained in the dictionary. By considering the intimacy between each pair of words on this scoring, the awkward phrases are excluded from evaluation and so the accuracy of the result improves.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Multi-term Word Post-process with Word Attribute

      Disclosed is the mechanism to improve the performance of
post-processor for the multi-term word.  The word in the dictionary
has its own attribute(s) and the system defines the degree of the
intimacy between each pair of the attributes.  The post-processor
selects a set of words that composes the result of input data by
scoring candidates contained in the dictionary.  By considering the
intimacy between each pair of words on this scoring, the awkward
phrases are excluded from evaluation and so the accuracy of the
result improves.

      The post-processor gets the candidate matrix as input, which is
composed of the character candidates for each column.  It tries to
cut the input matrix to some groups of columns, selects the most
specious word for each groups from the dictionary and composes them
into one string as one result for this cutting.  Then it selects only
one cutting that makes the total score the best among the all
possible cuttings for the input.  The post-processing method of
multi-term word is explained in [*] Consequently, the system
classifies the words in the dictionary into some groups, the words in
each of which have the same attribute, such as the first name, the
last name, the company name, the place and so on.  Each word has a
bitmap to express its attribute so that it can have the multiple
attributes simultaneously (Fig. 1).  The system also defines the
degree of intimacy between 2 attributes.  It is expressed as the
matrix whose (i,j) element is the probability of the event:the word
of attribute (j) appears after the word of attribute(i) (Fig. 2).  If
this value is near to 1, the of attribute(j) is likely to appear
after the word of attribute (i).  When either of the 2 words has
multiple attributes, the conn...