Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Phrase-Segmentation Method Using Mental Time

IP.com Disclosure Number: IPCOM000100976D
Original Publication Date: 1990-Jun-01
Included in the Prior Art Database: 2005-Mar-16
Document File: 2 page(s) / 77K

Publishing Venue

IBM

Related People

Knaneko, H: AUTHOR [+2]

Abstract

A new method to reduce phrase-segmentation error in Kana to Kanji Conversion (KKC) is described. This method decides a phrase boundary by using the interval between keystrokes, together with grammatical analysis. In this method, letter boundaries where a user makes a pause are more likely to become phrase boundaries than ones with no pause. This method is natural because it is based on the fact that typing tends to become slower at phrase boundaries because a user mentally prepares for the next phrase.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Phrase-Segmentation Method Using Mental Time

       A new method to reduce phrase-segmentation error in Kana
to Kanji Conversion (KKC) is described.  This method decides a phrase
boundary by using the interval between keystrokes, together with
grammatical analysis.  In this method, letter boundaries where a user
makes a pause are more likely to become phrase boundaries than ones
with no pause.  This method is natural because it is based on the
fact that typing tends to become slower at phrase boundaries because
a user mentally prepares for the next phrase.

      Besides KKC systems, some on-line natural language entry
systems can take advantage of this method.

      A conventional phrase-segmentation method for KKC is as
follows:
(1) make candidates of segmented phrases,
(2) sum the penalty points for each candidate, and
(3) select the candidate with the smallest penalty.

      Penalty points are given for each word in a candidate according
to the following table;
 part of speech           penalty points
 content word                   10
 prefix / suffix                 6
 particle                        0

      Unfamiliar words are given larger penalties than these. In this
disclosure, we assume that texts consist of familiar words, for
simplicity of exposition.  However the method in this report is
applicable to texts which contain unfamiliar words.

      As content words have a large value, candidates with a small
number of phrases (i.e., small number of content words) will be
selected in most cases.

      In contrast to the conventional penalty points, we introduce
the promotion points for each letter boundary, which promotes the
letter boundary to become a phrase boundary.  The promotion points
are:
   (promotion points) = A * ( t - mu ) / sigma where A is
a positive constant, t is dwell time at the letter boundary, mu is
the average of dwell time at letter boundaries, and sigma is the
standard deviation of the dwell time.  The values of mu and sigma are
determined by the past key strokes of the user. The promotion points
show how long...