Browse Prior Art Database

Resouce bundle information injection for online translation services

IP.com Disclosure Number: IPCOM000224501D
Publication Date: 2012-Dec-31
Document File: 6 page(s) / 151K

Publishing Venue

The IP.com Prior Art Database

Abstract

In the age of big data and globally integrated enterprises, the importance of information retrieval should not be neglected. When content is only limited in one language, it is limited within specific geographical borders, and does not meet the demand of modern times. To make the content available to a broader audience with a reasonable cost, machine translation has been adopted in order to save the cost of human translation. Our invention promotes the accuracy of machine translation. Current machine translation methods may not be able to analyse the text syntactically correctly. While the output of machine translation reaches 70% of accuracy, compared to the output of 90% of accuracy, it is not as half valuable. The output of current methods will require higher human editing efforts than our method. By this improved machine translation, more usable user information is produced and enterprise can save huge cost on making important content available in multiple languages. The needs of translation is booming in internet era. Users want to get the information not only in their languages but also in other languages. Machine translation is used to translate program UI and data. Moreover, linguistic annotation is widely used in machine translation and CLIR ( Cross-language information retrieval ) to analyze the unstructured text information. The emergence of sensors as sources of big data highlights the needs too. The core idea of this invention is to inject invisible characters to the UI ( resource files) or data and the invisible characters are formed as binary expression for annotating naturally-occuring text. The annotation is using "part-of-speech" tags to express the linguistic structure. The tagging information is invisible and hidden from users, there is no impact for the function and data, neither to the rendering of UI panels. The linguistic tags injection can be performed by the programmers and the ID writers. With the UI injection of the invisible "part-of-speech" tags, it can help the machine translation to do more accurate translation.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 76% of the total text.

Page 01 of 6

Resouce bundle information injection for online translation services

Workflow for the invention:


1. Get the original file.


2. Use the Penn Treebank to mark the part-of-speech for every string.

The list of part-of-speech tags used in the Penn Treebank Project as below. Example:
Original String:
This book a interesting book.

Book a meeting room.

New String:

This/DTbook/NNa/DTinteresting/JJbook/NN./. Book/VBa/DTmeeting/NNroom/NN./.


3. Using the Hidden information string instead of the part-of-speech tags.

The hidden information is U+200B and U+FEFF, which can not display in the screen.

The U+200B represents "0"; the U+FEFF represents "1". This/DTbook/NNa/DTinteresting/JJbook/NN./. This\U200B\U200B\U200B\U200B\U200B\U200B\UFEFF\UFEFFbook \U200B\U200B\U200B\U200B\UFEFF\UFEFF\U200B\U200Ba \U200B\U200B\U200B\U200B\U200B\U200B\UFEFF\UFEFFinteresting \U200B\U200B\U200B\U200B\U200B\UFEFF\UFEFF\UFEFFbook \U200B\U200B\U200B\U200B\UFEFF\UFEFF\U200B\U200B. \U200B\U200B\UFEFF\U200B\U200B\UFEFF\UFEFF\UFEFF

1


Page 02 of 6


4. Save the file as the new file.

List of part-of-speech tags used in the Penn Treebank Project:

2


Page 03 of 6

3


Page 04 of 6

Chart:

4


Page 05 of 6

5


Page 06 of 6

Advantages:

As the linguistic annotation is injected in UI ( resource files) and data without impacting the function and UI rendering. The machine translated UI of a program and the search/analysis of instructed data will have higher accuracy and quality. For example, adding a linguistic tag for the te...