Browse Prior Art Database

Combined-Western Coded Character Set

IP.com Disclosure Number: IPCOM000103946D
Original Publication Date: 1993-Feb-01
Included in the Prior Art Database: 2005-Mar-18
Document File: 4 page(s) / 126K

Publishing Venue

IBM

Related People

Arevalo, PE: AUTHOR [+2]

Abstract

Combined-Western Coded Character Set (CWCCS) encompasses character representation requirements of Western Nations and their Languages. It defines the character repertoire and the EBCDIC and ISO Extended Encoding Schemes used for encoding the character repertoire.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Combined-Western Coded Character Set

      Combined-Western Coded Character Set (CWCCS) encompasses
character representation requirements of Western Nations and their
Languages.   It defines the character repertoire and the EBCDIC and
ISO Extended Encoding Schemes used for encoding the character
repertoire.

      Extended Encoding Schemes are used to accommodate the large
size of the character repertoire.  Single-Byte (8-Bits) encoding
schemes have a limitation of 256 control and graphic characters, as
256 is the greatest value that can be represented by 8-Bits encoding.

      The use of Coded Character Set Identifiers (CCSIDs) defined by
the IBM SAA* Character Data Representation Architecture (CDRA) allow
CWCCS t coexist with existing encodings for Single-Byte Character
Sets, Far-East Double-Byte Character Sets, and future encodings aimed
at encoding a Universal Character Set (UCS).  It is anticipated that
character repertoire of CWCCS will be one of a subset of UCS.  UCS
subsets will be required for purposes of matching exiting device
capabilities (font/glyph resolution) and device resources (font
resources).

      The EBCDIC Single-Byte encoding scheme has the following rules
for assigning code points (or hexadecimal values) to 256 characters:

1.  The first 64 code points (X'00' to X'3F') are assigned to control
    characters
2.  The following code point (X'40') is for the space character
3.  The next 190 code points (X'41' to X'FE') are for graphic
    character s
4.  The last code point (X'FF') is assigned to the eight-ones
    character

      The EBCDIC Extended Encoding Scheme (EBCDIC-EES) used to encode
the CWCCS character repertoire adheres to above EBCDIC assignation
rules, and has the following complementary rules:

1.  Control Characters (X'00' to X'3F' and X'FF')

    The code points for controls are used as per the EBCDIC
    Single-Byte encoding scheme established purposes.  They are coded
    and processed as pure single-byte characters.  To allow for
    vector processing, such as searching for a string in two places
    of the data stream at the same time, control characters when
    embedded in the data stream must always be in even numbers.  The
    null character must be used to get an even number.

2.  Space Character is a double-byte character (X'4040')

3.  Double-Byte Character Representation

    The first-byte and second-byte of a double-byte code point can
    only have values from 65 to 254 (X'41' to X'FE') for a total of
    190 values.  The first-byte of the double-byte constitutes a
    "Ward Identifier".  The second-byte of the double-byte represents
    the graphic character.

      The EBCDIC-EES Ward is equivalent to an EBCDIC single-byt...