Browse Prior Art Database

Universal Symbol Set Using Multi-Byte Code Architecture

IP.com Disclosure Number: IPCOM000120311D
Original Publication Date: 1991-Apr-01
Included in the Prior Art Database: 2005-Apr-02
Document File: 4 page(s) / 115K

Publishing Venue

IBM

Related People

Dickson, DW: AUTHOR

Abstract

This article describes an architecture for the unique mapping of symbols to bit strings. The symbols include those of the ASCII and EBCDIC character sets, as well as their DBCS (Double Byte Character Set) extensions including all characters, ideograms and pictograms in general use in the major languages of the world. The disclosure concerns a multi-byte code architecture prescribing the allocations of the one-byte code set to be the same as the International 7-bit ASCII code. Allocations of symbols to bit strings in other code sets are not prescribed as this is the work of international committees.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 50% of the total text.

Universal Symbol Set Using Multi-Byte Code Architecture

      This article describes an architecture for the unique
mapping of symbols to bit strings.  The symbols include those of the
ASCII and EBCDIC character sets, as well as their DBCS (Double Byte
Character Set) extensions including all characters, ideograms and
pictograms in general use in the major languages of the world.  The
disclosure concerns a multi-byte code architecture prescribing the
allocations of the one-byte code set to be the same as the
International 7-bit ASCII code.  Allocations of symbols to bit
strings in other code sets are not prescribed as this is the work of
international committees.

      A coding architecture whereby all symbols in general use in the
world can be represented by bit strings must be extensible, without
limit to the number of symbols that it can hold.  It must also be
compact in the number of bytes used per symbol without any external
modal setting that influences the interpreting of bit strings and
based on 8-bit bytes because that is the standard storage
architecture.  Every registered unique symbol in the world should be
represented by a unique standard bit string. Finally, the number of
bytes allocated to a symbol must be easily determined.  The coding
architecture uses an 8-bit- aligned 'Start-Step-Stop' code with a
variable number of prefix bits to determine the number of bytes used
for the symbol represented.

      The table below shows that the position of the first zero bit
in the prefix (first part of the bit string) determines the overall
length of the bit string.
      Prefix bits    Length of Bit String (in bytes)
      0                   1
      10                  2
      110                 3
      1110                4
      11110               5
      111110              6
      1111110             7
      11111110            8

      The following table shows the overall structure of the bit
strings.  Note that for every extra byte in the bit string, the
cardinality increases by a multiple of 128.

                            (Image Omitted)

      As the prefix bits can be extended indefinitely (that is, not
constrained to one byte), the architecture could be completely
extensible.  However, prefix bits are constrained to one byte so that
only the range of this byte need be tested to determine the total
number of bytes allocated to the symbol.  Thus, no more than eight
bytes can be allocated to a symbol, but there is little need to go
beyond 3 or 4 bytes, which give approximately two million and 268
million possible symbols, respectively.

      BIT STREAM TERMINATOR
The one remaining value of the first (or only) byte of the bit
strings is X'FF'(all 1 bits).
This is designated to allow a bit stream (of cod...