Browse Prior Art Database

A new method to keep meta data in source file for further translation

IP.com Disclosure Number: IPCOM000244452D
Publication Date: 2015-Dec-13
Document File: 3 page(s) / 46K

Publishing Venue

The IP.com Prior Art Database

Abstract

This is a method to keep the meta data in source file for further translation. With this method, user won't get confused while some commonly used names will be kept as is. The translator will also not have hard time to translation the commonly used names.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 3

A new method to keep meta data in source file for further translation

Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one natural language to another.

Improved output quality can be achieved by human intervention. E.g. source file author unambiguously identified which words in the text are names; author mark the detailed message for words that can be interpreted in different ways.

However, there are some disadvantages;


1) MT such as "web translation on-the-fly" translate all the strings on the screen, it's not able to tell which are translatable which are DNT (Do Not Translate)

Ex: A product name, user name are DNT


2) MT to translate source PII (program integrated information). In existing way, people create a special format for the source file, such as tags or comment, containing all necessary information for translation. And MT engine uses the this format to do the translation. Once the files are rendered (ex: on Web), the comments and tags are not be able to carried over.

Ex: A source file is translated by MT that refers to the tags and comment in source file, then published on Web. A user copy the rendered Web page, and paste to somewhere else, the comments and tags won't be carried over.

The core idea of this invention is to inject invisible characters to the UI ( resource files) or data and the invisible characters are formed as binary expression for annotating naturally-occuring text. The tagging information is invisible and hidden from users, there is no impact for the function and data, neither to the rendering of UI panels. The DNT tags injection can be performed by the programmers and the ID writers. With the UI injection of the invisible tags, it can help the machine translation to do more accurate translation. We can also add more tags, such as "verb", "adjective"...etc

The tags are formed as binary expression by using the Hidden information string.

The hidden information can be Unicode characters such as U+200B and U+FEFF, which can not display in the screen. A group of hidden information can represent a special tag.

E.g. DNT tag can be represented as : /U+200B, /U+200B, /U+200B, /U+200B,/ U+200B, /U+200B, /U+200B, /U+FEFF E.g. Adjective tag can be represented as : /U+200B, /U+200B, /U+200B,/ U+200B,/U+200B, /U+200B,/U+FEFF, /U+200B

Advantages:


1) As the tags injection does not impact the function and UI rendering. The machine translated UI of a program have higher accuracy and quality.

1



Page 02 of 3

For example:

a. It won't translate a...