Browse Prior Art Database

Handling of Print Control Tags in Machine Translation

IP.com Disclosure Number: IPCOM000123978D
Original Publication Date: 1999-Sep-01
Included in the Prior Art Database: 2005-Apr-05
Document File: 8 page(s) / 297K

Publishing Venue

IBM

Related People

Harada, M: AUTHOR

Abstract

This article describes a method to translate text which contains print control tags with machine translation program.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 17% of the total text.

Handling of Print Control Tags in Machine Translation

   This article describes a method to translate text which
contains print control tags with machine translation program.

   To handle print control tags in Machine Translation,
the following two steps are required. 

                            (Image Omitted)

  1.  Tag detection and categorization
      To detect which parts are tags and what they mean.
      Here, tags are categorized as follows:
      - Tags out of sentences
        Tags which shows text structure such as "" "
  " "
       "
         "
   1."
      - Tags inside sentences
        Tags which highlights words ":hp1" "" "" and
        reference tags such as ":hdref" ":link" ":elink"
        In this document, detection and categorizing logics
        are not included and tags inside sentences only are
        handled.
  2.  Translation process referring tags
      The process is shown later
      Tags inside sentences are then categorized as follows:
      - Tags which takes grammatical roles as they are
        For example, "." tag should be handled as a noun.
      - Tags which has no grammatical roles and show start of blocks
        For example ":hp1" tag
      - Tags which has no grammatical roles and show end of blocks
        For example ":ehp1" tag
      Tags are given following masks for easier process in
      Machine Translation
      1.  {{..}
          A string between start and end masks is thought as a
          noun and output in Japanese as it is.
          - {{-- Fig 'F123' unknown --}
          - {{File System}
          "File System" is not translated.  Because tags are
          often used for product name, the area should be
          translated.  But the logic can be changed using next
          masks.
      2.  S..} {E..}
          Text between {S..} and {E..} is translated.
          - {S}File System{e}
            "File System" is not to be translated.
          Depending on what is between {S..} and {E..},
          how it is handled differs.

   From "{{" to "}" is handled as a noun in Machine
Translation process and output as it is.

   From "{S..}" to "{E.."}" is translated with following
steps:
  1.  Word sequence between "{S..}" and "{E.."}"
  2.  The word sequence is translated referring dictionaries
      or using machine translation logic
  3.  From "{S..}" to "{E.."}" is replaced with a special word
      and then whole sentence is translated using Machine
      Translation logic.  Some attributes are defined for the
      special words such as Japanese and number information.
      What type of special word is to be used and what
      attributes are t...