Browse Prior Art Database

Multi-Term Word Post-Process with Word Frequency

IP.com Disclosure Number: IPCOM000113626D
Original Publication Date: 1994-Sep-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 2 page(s) / 46K

Publishing Venue

IBM

Related People

Kita, Y: AUTHOR

Abstract

Disclosed is the mechanism to improve the performance of post-processor for the multi-term word. The post-processor selects a set of words that composes the result of input data by scoring candidates contained in the dictionary. By considering the frequency of use of each candidate on this scoring, hardly-used words are excluded from evaluation and so the accuracy of the result improves.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 65% of the total text.

Multi-Term Word Post-Process with Word Frequency

      Disclosed is the mechanism to improve the performance of
post-processor for the multi-term word.  The post-processor selects a
set of words that composes the result of input data by scoring
candidates contained in the dictionary.  By considering the frequency
of use of each candidate on this scoring, hardly-used words are
excluded from evaluation and so the accuracy of the result improves.

      The post-processor gets the candidate matrix as input, which is
composed of the character candidates for each column.  It tries to
cut the input matrix to some groups of columns, selects the most
specious word for each groups from the dictionary and composes them
into one string as one result for this cutting.  Then it selects only
one cutting that makes the total score the best among the all
possible cuttings for the input.  The post-processing method of
multi-term word is explained in (*), and for detail please refer it
to.  Well then, each word in the dictionary has its own frequency
value.  The frequently used words have small value and the value for
the rarely used ones are large.  By multiplying the original score
with this frequency, the frequency of words in the application can be
reflected to the system.  For example, assume the input data:D is
composed of N terms, Term(1) to Term(N) and call this cutting
&Gamma..  Here, the score for this cutting:  AccPena(D:&Gamma.)  is
calculated as follows:

  ...