Browse Prior Art Database

Language Sensitive Search Techniques

IP.com Disclosure Number: IPCOM000109603D
Original Publication Date: 1992-Sep-01
Included in the Prior Art Database: 2005-Mar-24
Document File: 2 page(s) / 99K

Publishing Venue

IBM

Related People

Dickens, D: AUTHOR [+2]

Abstract

This article describes a technique for searching data bases with language characteristics as defined by the user being taken into consideration.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Language Sensitive Search Techniques

       This article describes a technique for searching data
bases with language characteristics as defined by the user being
taken into consideration.

      As products become popular on a worldwide basis the requirement
for language-sensitive search will become important.  Data bases are
historically created and data obtained assuming that the data
contained within the data base is English.  With multilingual users
becoming more prevalent, search capability needs to be provided that
will result in data being retrieved in accordance with non-English
rules, providing culturally expected results.

      The proper technique is to search the data base with the
language characteristics as defined by the user being taken into
consideration.  The following language groups should be supported:
1.  Latin-1 Common
2.  Danish
3.  Icelandic
4.  Norwegian
5.  Spanish
6.  Swedish/Finnish
Examples of what the capabilities will provide:
0    If a Spanish-sensitive search is requested, the various
combinations of the letters "ch" (CH, Ch, cH, ch) and "ll" (LL, Ll,
lL, ll) are used in proper Spanish sequence.  Doing a search in which
data equal to or following "CM..." is to be retrieved would also
include the aforementioned "CH" combination, even though in English
they would not be retrieved, since in Spanish "CH" follows "CZ" and
is before "D".  Likewise "LL" follows "LZ" and is before "M", so a
Spanish search for data falling between "LA" and "LM" would correctly
not retrieve records with "L...