Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Character Name Oriented Notation for Uniform Resource Locator/Universal Runtime Interface

IP.com Disclosure Number: IPCOM000119089D
Original Publication Date: 1997-Nov-01
Included in the Prior Art Database: 2005-Apr-01
Document File: 2 page(s) / 89K

Publishing Venue

IBM

Related People

Kido, A: AUTHOR

Abstract

Disclosed is a method to include national characters in Uniform Resource Locator/Universal Runtime Interface (URL/URI) by using a character name oriented notation, instead of an existing code value oriented notation.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Character Name Oriented Notation for Uniform Resource Locator/Universal
Runtime Interface

      Disclosed is a method to include national characters in Uniform
Resource Locator/Universal Runtime Interface (URL/URI) by using a
character name oriented notation, instead of an existing code value
oriented notation.

      The character repertoire that is allowed to specify URL and URI
is strictly restricted within the character repertoire of lower case
and upper case English alphabets, numeric, and a few special symbols.
This restriction is to guarantee the character integrity between
systems that  send or receive URL/URI.  According to this rule,
national characters that are outside the above repertoire cannot be
included in an URL/URI  as it is.  For the inclusion of the national
character in the URL/URI, a  special notation is allowed.  That is to
convert the code value of the character to hexadecimal notation and
add "%" on the top of the hexadecimal notation as a prefix.  However,
this code value oriented notation is not portable between systems
that native coded character sets are different, e.g., between an IBM*
PC using an ASCII-based coded  character set and an IBM System 390
using an EBCDIC-based coded character  set.  Disclosed is a solution
to the above problem by using a new character name oriented special
notation for the national characters, instead of using an existing
code value oriented notation.

      An implementation of the character notation could be as
follows: Use "%u" prefix followed by a four digit hexadecimal string.
It is "%uxxxx" where the xxxx is a character short identifier
determined by  ISO/IEC JTC1/SC2.  The xxxx is actually a code value
of a character represented ISO/IEC 10646 (Unicode) two octet
representation (UCS-2).  Also, use "%U" prefix followed by an eight
digit hexadecimal  string.  It is "%UXXXXXXXX" where the XXXXXXXX
also is another representation of a character short identifier
determined by ISO/IEC JTC1/SC2.  The XXXXXXXX is also an actual code
value of ISO/IEC 10646 four octet representation (UCS-4).  Since all
of national characters used in modern computers are included in the
ISO/IEC 10646 standard, by using the character name that comes from
ISO/IEC 10646, all modern national charact...