Browse Prior Art Database

Conversion Methodology for Extended Unix Code Encoded Data

IP.com Disclosure Number: IPCOM000118411D
Original Publication Date: 1997-Jan-01
Included in the Prior Art Database: 2005-Apr-01
Document File: 8 page(s) / 347K

Publishing Venue

IBM

Related People

Chow, WS: AUTHOR [+5]

Abstract

Disclosed is a method for converting Extended Unix Code (EUC) encoded data streams to and from the PC-type and EBCDIC-type data streams. Each byte value of the encoded character in the data stream to be converted is used as an index into arrays of subtables containing offsets to retrieve the matching encoded character.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 25% of the total text.

Conversion Methodology for Extended Unix Code Encoded Data

      Disclosed is a method for converting Extended Unix Code (EUC)
encoded data streams to and from the PC-type and EBCDIC-type data
streams.  Each byte value of the encoded character in the data stream
to be converted is used as an index into arrays of subtables
containing offsets to retrieve the matching encoded character.

      Data transferred between different computing environments needs
to be converted to the applicable encoding scheme for processing
(e.g., ASCII to/from EBCDIC).  Conversion tables alone do not ensure
the transfer or sharing of data objects between different computing
environments: the proper selection and use of these tables is
essential.  Conversion methods are used with the tables to ensure
that the desired results are obtained.  It is the responsibility of
the conversion method to recognize the characteristics and
requirements of  the input and output data.  The conversion method
described in the following sections are specifically for coded
graphic character strings  whose semantics follow the respective
encoding scheme definitions for the character encodings.

      The EUC conversion tables are used to convert EUC encoded
graphic character data from an EUC platform to or from a EBCDIC
(termed host) or PC platform.

      The EUC conversion tables use a normalized  form of
data.  Input passed to and output generated from the conversion
tables is also in normalized form.  The PC code points are normalized
by placing  leading X'00'  in front of each single-byte to yield a
two byte form.  Host (EBCDIC) data must have the SO-SI control
characters deleted during normalization and reinserted afterwards
during denormalization.  As with the PC data, a leading X'00' is
inserted in front of any single-byte data.  The EUC code points are
normalized to four byte values.

      Fig. 1 shows the general use of the EUC conversion tables.  The
input byte or bytes, up to a maximum of four bytes per code point,
are first normalized and used as input to the conversion table.  The
output (again four bytes per code point maximum) from the conversion
table is  also normalized data which must be denormalized prior to
subsequent processing.

      EUC Conversions

      The EUC encoding technique uses up to four coded graphic
character sets.  Each set must be predefined as the information is
not carried in the text data stream.  In Character Data
Representation Architecture (CDRA) of IBM*, the CCSID determines the
group of coded graphic character sets being used.  Code points from
the left half of the 8-bit encoding space (high order bit is OFF) are
in the set G0. Code  points which lie in the right half of the
encoding space (high order bit  is ON) are in the set G1.  Single
shift control characters known as SS2  and SS3 are used to invoke the
other sets G2 and G3.

      The conversion tables for EUC require that the...