Browse Prior Art Database

Word-Oriented Pre-editing in Machine Translation

IP.com Disclosure Number: IPCOM000122402D
Original Publication Date: 1991-Dec-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 6 page(s) / 191K

Publishing Venue

IBM

Related People

Nomiyama, H: AUTHOR

Abstract

Disclosed is a mechanism for effective pre-editing in machine translation, in which word equivalents are selected simply by looking for pairs of source language words and target language equivalents. (The word "pre-editing" generally means editing sentences so that they can easily be processed by machine translation systems. In this article, however, it means all the works done by a human editor before translation to achieve better quality.)

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 42% of the total text.

Word-Oriented Pre-editing in Machine Translation

      Disclosed is a mechanism for effective pre-editing in
machine translation, in which word equivalents are selected simply by
looking for pairs of source language words and target language
equivalents. (The word "pre-editing" generally means editing
sentences so that they can easily be processed by machine translation
systems. In this article, however, it means all the works done by a
human editor before translation to achieve better quality.)

      Several machine translation systems are now available, but they
are far from fully automated, so human editors must check their
output (by post-editing).

      The cost of post-editing is not negligible. Editors must know
both languages and must understand the structural correspondences
between equivalent sentences in both languages. The internal
structures of sentences are hard for human editors to understand.
However, it is much easier to check whether word-equivalent pairs are
correct, because editors do not need to understand their structures.

      Checking word-equivalent pairs is only a part of pre-editing,
but it is very important for the quality of translation. From the
viewpoint of lexical selection in documents, checking word-equivalent
pairs has several advantages. Generally, words that play important
roles in the document occur many times and their meanings are
constant (1). This means that such words have unique equivalents
throughout the whole document. If the word equivalents for such words
are fixed before translation, lexical selection will be done also for
all occurrences of the word at the same time, and the consistency of
lexical selection will be guaranteed.

      In addition, the proposed mechanism makes pre-editing work more
effective by using a preference calculated from target-language
knowledge (2).

      The system consists of several processes (Figs. 1 and 2). We
suppose that dependency structures (Fig. 3) are used as internal
structures in the translation process.
1) Pre-Translation

      Input documents are pre-processed to create lexical selection
files and word frequency files. A lexical selection file contains the
results of lexical selections made by the translation process. The
items in a lexical selection file are:
 o Word
 o Part of speech
 o Location (sentence number and node location)
 o Selection flag (if selected, then 1, else 0)
 o Preference (a pair of a co-occurrence levels and its frequency)
 o Content of the dictionary.

      Preferences are calculated according to the method described in
a previous publication [2]. Preferences are defined as pairs of co-
occurrence levels and their frequencies, and are calculated according
to the binary relations consisting of labeled arcs connecting
modifiers and governors.  The binary relations extracted from the
inter mediate dependency structure (Figs. 4 and 5) are checked
against the target language dependency str...