Browse Prior Art Database

Case/Diacritic-Insensitive And Language-Independent Query

IP.com Disclosure Number: IPCOM000119953D
Original Publication Date: 1991-Mar-01
Included in the Prior Art Database: 2005-Apr-02
Document File: 4 page(s) / 136K

Publishing Venue

IBM

Related People

Gilgen, LS: AUTHOR [+5]

Abstract

Disclosed is an algorithm for providing a software query against EBCDIC data stored in a directory in various languages and mixed upper/lower case. The results of the query will be minimally impacted by the language configured for the hardware input device and the case of either the stored data or the search compare character string provided on the query. This algorithm offers the customer the most flexibility, within the relational data base, to cross language barriers without necessarily having to reconfigure or purchase additional hardware.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 48% of the total text.

Case/Diacritic-Insensitive And Language-Independent Query

      Disclosed is an algorithm for providing a software query
against EBCDIC data stored in a directory in various languages and
mixed upper/lower case.  The results of the query will be minimally
impacted by the language configured for the hardware input device and
the case of either the stored data or the search compare character
string provided on the query.  This algorithm offers the customer the
most flexibility, within the relational data base, to cross language
barriers without necessarily having to reconfigure or purchase
additional hardware.

      Each input device has an associated character set/code page
which identifies the mapping of code points to displayed characters
for the language which that device supports.  Each language has at
least one character set/code page associated with it and included are
code points for those displayable characters included in that
language.  There currently is no one single character set/code page
which contains all of the displayable characters required by all of
the languages. This, then, poses a problem for software programs
which must accept and store data in multiple languages and also
provide a facility for later searching against that data using a
search compare character string which may also be in any of multiple
languages.

      Fig. 1 is a flow of the software program being disclosed which
offers the user the most flexibility to cross language barriers
without necessarily having to reconfigure or purchase additional
hardware.

      In this example, the customer issuing the request to create an
entry in the directory is using a device which has been configured
for the German language.  The character set in use is 00697 and the
code page is 00273.  The data being supplied on the CREATE_ENTRY
request contains the character string "Zurich". This string is
represented internally by 'E9D099898388'x where:
      'E9'x = "Z" (upper case Z)
      'D0'x = "u" (lower case u with umlaut)
      '99'x = "r" (lower case r)
      '89'x = "i" (lower case i)
      '83'x = "c" (lower case c)
      '88'x = "h" (lower case h)

      This string is stored as-is in an area containing data in its
original form; we will call this area the Presentation Data Area.
The data is then translated into a form which is case- and
diacritic-insensitive.  The result of this transformation relies on
the mapping defined by the customer for the enterprise.  This
interface is exposed to and modifiable by the customer to allow the
greatest amount of customer refinement and tailorability.  In this
example, the translation results in the character string "ZURICH"
being generated and stored in what we will call the Searchable Data
Area.  This string is represented internally by 'E9E4D9C9C3C8'x
where:
      'E9'x = "Z" (upper case Z)
      'E4'x = "U" (upper case U without umlaut)
      'D9'x =...