Browse Prior Art Database

Example-based machine translation system having means for robust semantic processing

IP.com Disclosure Number: IPCOM000130256D
Original Publication Date: 2005-Oct-18
Included in the Prior Art Database: 2005-Oct-18
Document File: 3 page(s) / 36K

Publishing Venue

IBM

Abstract

Disclosed is a example-based machine translation system having means for robust semantic processing. By automatically selecting proper senses among given collocation words, it makes possible to construct large-scale example database without coverage and maintenance problem.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 57% of the total text.

Page 1 of 3

Example-based machine translation system having means for robust semantic processing

[BACKGROUND]

Example-Based Machine Translation (EBMT) is a promising machine translation approach when source language and target language has a lot of differences. To select proper word sense, it uses an example database whose tuple is a pair of source language and target language translation pattern.

Fig. 1. shows that a fragment of example database associated with the Korean verb . Fig. 1 shows only the source part of translation pattern. An English word in the parenthesis is only for the readability, not target translation pattern. As you can see, one pattern is composed of several slot frame(complement argument). For Fig. 1, both of the two example has two slot frame, subject and object. A slot frame has syntactic and semantic constraints. For the example, the syntactic constraint is a case marker, and . Semantic constraint is represented by list of collocation word, such as , , ,


.

<Figure. 1> A fragment of example database

If we do exact string matching to calculate similarity between input sentence and an entry of example database, we have the coverage problem. In this case, we can extend coverage of an example by replacing the collation words to the semantic features. It can be done by human or computer. If a human lexicographer tags semantic features, he/she should understand what the intention of the collocation words is and replace them to proper semantic features. It can be accurate but very labor-intensive. And also as semantic features grow, it is very difficult to maintain the existing example database. The better approach is to make computer replace collocation words to the semantic features by using thesaurus. The quality of thesaurus-based approach heavily depends on the quality of the thesaurus and the degree of polysemy. If we replace a collocation word to the all the possible semantic features, the example will be generalized overly and incorrectly. It

1

[This page contains 8 pictures or other non-text objects]

Page 2 of 3

makes very severe translation quality problem. Fig. 2. shows the senses of the collocation words at the object slot in the first example.

<Figure. 2> Senses of collocation words

[SYSTEM DETAILS]

The core part of the system is the word sense disambiguation method, which selects a proper word sense for each collocation word from its list. We can replace them to the output of the word sense disambiguation method. Because the replacement process can be done at offline mode, there is no run-time inefficiency. And more, if the method makes reliable results, we can improve translation quality by taking away mis-generalization problem and with run-time efficiency by removing unrelated semantic features.

It assumes that collocation words are composed of one or more semantic clusters. The more proper sense for each collocation wo...