Browse Prior Art Database

INDEXING A LIST OF DATA STRUCTURES FOR FAST LOOKUP

IP.com Disclosure Number: IPCOM000021399D
Original Publication Date: 2000-Nov-01
Included in the Prior Art Database: 2004-Jan-16
Document File: 2 page(s) / 23K

Publishing Venue

Sony Technical Digest

Related People

Eugene Joseph Koontz: INVENTOR

Abstract

This invention entails a method for efficiently finding a set of matching examples in an example-based machine translation system. This allows for the use of a large example database to increase translation quality while mitigating the cost in run-time matching speed that a large database would normally entail.

This text was extracted from a Microsoft Word document.
This is the abbreviated version, containing approximately 52% of the total text.

Sony Technical Digest, Volume 3, November 2000, ISSN 1521-5180

INDEXING A LIST OF DATA STRUCTURES FOR FAST LOOKUP

Invention by:

Eugene Joseph Koontz

This invention entails a method for efficiently finding a set of matching examples in an example-based machine translation system. This allows for the use of a large example database to increase translation quality while mitigating the cost in run-time matching speed that a large database would normally entail.

Example-based translation uses a database of examples, each of which is a pair of source and targets language expressions. Each pair is selected to exemplify some important feature of the source or target language. At run time, the system consults the example database to find the best candidate pair for the input, and this pair is used to generate an expression in the target language. This method allows translation quality to improve by adding relevant examples to the database.

However, an iterative search of the examples at run-time becomes the dominant factor in system performance for a realistically large example database. For embedded devices with limited processing power, a more efficient method of searching the database is needed.

Every linguistic sign is represented in the system by a fstructure, similar to the representations used in linguistic theories such as LFG or HPSG. It is a set of feature-value pairs, where each feature is an atomic symbol, and a value is either an atomic symbol or an fstructure. A path is a list of features where F2 is a feature within the value of F 1, and F3 is a feature within the value of F2, and so on.

The input to the transfer component of the translation system is the input fstructure, which represents the syntactic and semantic structure of a string in the input language; the output is another fstructure which is passed to the Generation Component, which turns this fstructure into a string in the target language.

Within the transfer component, there are two subcomponents. The first is the match step. This eliminates from the example set of all structures that do not satisfy certain boolean queries. The boolean queries to be applied to the example are determined by the transfer grammar, which provides a top-down control structure to the matching process. The transfer grammar is a set of context free r...