Browse Prior Art Database

Segmenting Text for Translation

IP.com Disclosure Number: IPCOM000120755D
Original Publication Date: 1991-Jun-01
Included in the Prior Art Database: 2005-Apr-02
Document File: 2 page(s) / 93K

Publishing Venue

IBM

Related People

Chun, EG: AUTHOR [+3]

Abstract

This article describes an invention that defines a segment of text and saves attributes about that segment which can improve processing performance and allow for greater function.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Segmenting Text for Translation

      This article describes an invention that defines a
segment of text and saves attributes about that segment which can
improve processing performance and allow for greater function.

      The DisplayWrite* callable interface includes requests to aid
in the process of translation. In order to translate text, it is
necessary to divide a document into translation segments.  Each
segment is then translated one at a time until the entire document is
translated.  An example of a segment is a sentence, phrase or list of
items.  In order to implement the translation function, these
problems were confronted.
1)   Segmenting a document needs to occur in advance as a batch
process in order to save processing time.
2)   Translation performance must be optimized.
3)   During translation a user must be allowed to combine two
adjacent segments and split segments that were once combined.

      DisplayWrite 5/2 has resolved all these issues by defining a
DisplayWrite internal datastream multi-byte control that will define
a segment and store information about that segment.  This special
multi- byte control is inserted into the source document at the
beginning of each segment.  The control includes the segment number
and a status byte which contains flags to specify whether the segment
is active and whether or not it has been previously translated.
Following is a description of the Segment multi-byte control.
Parameter      Offset    Length    Occurrence     Value(hex)
Escape ID        0          1        Req'd            2B
Class            1          1        Req'd            D9
Count            2          1        Req'd            05
Type             3          1        Req'd            A6
Segment Number   4          2        Req'd            0-n
Segment Status   6          1        Req'd         see below
The Segment Status is a one-byte bit-encoded field which determines
the state of the segment being referenced. Possible values are as
follows:
0000 0000: Uncombined/Active/Untranslated/non-introductory
1000 0000: Combined
0100 0000: Inactive
0010 0000: Previously Translated
0001 0000:...