Browse Prior Art Database

Method for Suggesting Candidates in Chinese Error Check Systems for Detected Errors

IP.com Disclosure Number: IPCOM000118523D
Original Publication Date: 1997-Mar-01
Included in the Prior Art Database: 2005-Apr-01
Document File: 6 page(s) / 157K

Publishing Venue

IBM

Related People

Cheng, L: AUTHOR [+5]

Abstract

Disclosed is a method to offer candidates more effectively for many errors detected by a Chinese Error Check (CEC) system. By considering a character bigram Figure and word-formation possibilities, the candidate list is greatly shortened and the hit rate for correct candidates increased.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 43% of the total text.

Method for Suggesting Candidates in Chinese Error Check Systems for
Detected Errors

      Disclosed is a method to offer candidates more effectively for
many errors detected by a Chinese Error Check (CEC) system.  By
considering a character bigram Figure and word-formation
possibilities, the candidate list is greatly shortened and the hit
rate for correct candidates increased.

      In a CEC system, it is desirable to suggest candidates of
detected errors for users.  There are two reasons for this.  The
first is that the user may not know the correct substitutes for those
tricky errors,

                            (Image Omitted)

      The second is that it is much easier for a user to choose a
candidate from a short list than to manually delete the wrong
character and input the correct one.  It is important for a CEC
system to have a powerful candidate suggestion system.  The criteria
for this function are to have a high hit rate of correct candidates
and to be convenient for the user to select it.

In prior art, candidates are suggested in the following way:
   a) Put the cursor on a possibly incorrect character (the one
       marked up by the error check process; in most circumstances,
       CEC systems will consecutively mark up more than one
       character for an error detected.  In a commercially
       leading local product, the number of consecutive
       characters marked up for an error can even be as many
       as nine).
   b) List all characters that have similar input codes as the
       possibly incorrect one (the similarity of the codes means
       that they are similar in a specified input method, most
       probably the PinYin or the Five Stroke method).
   c) If the correct candidate is within the list, the user can
       select it and make the substitution.

      The drawback of this method is that the list can be too long
for users to look up.  In such prior art, the candidates suggested
can be as long as 50 characters for one possibly incorrect character.
If four consecutive characters are marked up during the error check
process, the user may have to look at up to 200 characters for
substitution.  Therefore, the convenience brought about by candidate
suggestion drastically reduces.

      In the present scheme, a replacement table is prepared for each
character.  This table includes characters which have similar PinYin
or Five Stroke code with that character.  These characters are
ordered according to their respective frequencies (i.e., the times of
this character appearing in a corpus divided by the total number of
characters of this corpus), the character with higher frequency will
precede that with lower frequency.  In simplified Chinese character
set (GB2312 set, which is the national standard of the set of level 1
and level 2, Chinese characters legitimate to use in the mainland of
...