Browse Prior Art Database

Translation Memory Tool

IP.com Disclosure Number: IPCOM000110636D
Original Publication Date: 1992-Dec-01
Included in the Prior Art Database: 2005-Mar-25
Document File: 2 page(s) / 65K

Publishing Venue

IBM

Related People

Conrad, M: AUTHOR [+3]

Abstract

This article concerns general translation memory tool (TMT) concepts, such as TM data base organization and access methods. Some implementation aspects are described as well. The term TM data base in this article denotes a collection of numerous translation units referred to as segments. Each segment consists of three components: control information, a source sentence and the equivalent target (translated sentence).

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 53% of the total text.

Translation Memory Tool

       This article concerns general translation memory tool
(TMT) concepts, such as TM data base organization and access methods.
Some implementation aspects are described as well.  The term TM data
base in this article denotes a collection of numerous translation
units referred to as segments.  Each segment consists of three
components: control information, a source sentence and the equivalent
target (translated sentence).

      TM segments are physically stored in TM blocks and are
logically grouped in clusters.  Each cluster is uniquely identified
by its key.  Noise information, such as text processing tags and some
language-dependent frequently recurring words (e.g., determiners) are
discarded from the source sentence in order to define the relevant
key.

      For searching for similar (non-identical) matches, a definition
of similarity (proximity) has to be given.  The proposed definition
is based on a statistical approach, namely, two sentences are
considered similar if both have a significant number of words in
common.  The degree of proximity is controllable.  The main advantage
of the statistical method is that it is independent of the source
language.

      However, the statistical check is too time-consuming to be
applied to each segment in the cluster.  Therefore, a secondary key
system is required.  It is used as an efficient means for filtering
out non-similar segments.  Once a sentence has been cleaned up...