Method for Suggesting Candidates in Chinese Error Check Systems for Detected Errors
Original Publication Date: 1997-Mar-01
Included in the Prior Art Database: 2005-Apr-01
Cheng, L: AUTHOR [+5]
Disclosed is a method to offer candidates more effectively for many errors detected by a Chinese Error Check (CEC) system. By considering a character bigram Figure and word-formation possibilities, the candidate list is greatly shortened and the hit rate for correct candidates increased.
Method for Suggesting Candidates in Chinese Error Check
a method to offer candidates more effectively for
many errors detected by a Chinese Error Check (CEC) system. By
considering a character bigram Figure and word-formation
possibilities, the candidate list is greatly shortened and the hit
rate for correct candidates increased.
In a CEC
system, it is desirable to suggest candidates of
detected errors for users. There are two reasons for this. The
first is that the user may not know the correct substitutes for those
The second is
that it is much easier for a user to choose a
candidate from a short list than to manually delete the wrong
character and input the correct one. It is important for a CEC
system to have a powerful candidate suggestion system. The criteria
for this function are to have a high hit rate of correct candidates
and to be convenient for the user to select it.
In prior art, candidates are suggested in the following
a) Put the cursor on a possibly incorrect character (the one
marked up by the error check process; in most circumstances,
CEC systems will consecutively mark up more than one
character for an error detected. In a commercially
leading local product, the number of consecutive
characters marked up for an error can even be as many
b) List all characters that have similar input codes as the
possibly incorrect one (the similarity of the codes means
that they are similar in a specified input method, most
probably the PinYin or the Five Stroke method).
c) If the correct candidate is within the list, the user can
select it and make the substitution.
of this method is that the list can be too long
for users to look up. In such prior art, the candidates suggested
can be as long as 50 characters for one possibly incorrect character.
If four consecutive characters are marked up during the error check
process, the user may have to look at up to 200 characters for
substitution. Therefore, the convenience brought about by candidate
suggestion drastically reduces.
present scheme, a replacement table is prepared for each
character. This table includes characters which have similar PinYin
or Five Stroke code with that character. These characters are
ordered according to their respective frequencies (i.e., the times of
this character appearing in a corpus divided by the total number of
characters of this corpus), the character with higher frequency will
precede that with lower frequency. In simplified Chinese character
set (GB2312 set, which is the national standard of the set of level 1
and level 2, Chinese characters legitimate to use in the mainland of