Browse Prior Art Database

Special Character Sort Sequence

IP.com Disclosure Number: IPCOM000035071D
Original Publication Date: 1989-Jun-01
Included in the Prior Art Database: 2005-Jan-28
Document File: 2 page(s) / 13K

Publishing Venue

IBM

Related People

Holub, KA: AUTHOR [+2]

Abstract

This article describes the use of a two-table translate method for assigning hexadecimal values to symbols so as to sort in a predetermined order that is correct for the alphabet being used. Dipthongs are bifurcated into separate characters for purposes of the sort.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 2

Special Character Sort Sequence

This article describes the use of a two-table translate method for assigning hexadecimal values to symbols so as to sort in a predetermined order that is correct for the alphabet being used. Dipthongs are bifurcated into separate characters for purposes of the sort.

For example, if the following German words were sorted according to their normally assigned ASCII codes, they would appear in the order shown: Fassade (facade) Fab (barrel) F"hre (ferry) Fubboden (floor) F"rwort (pronoun) F"be (feet) Because of the ASCII assignments, some of the words are not in the order desired. The double-s (esszet) character is assigned an ASCII hexadecimal value of E1 whereas it should sort as two S characters, i.e., ASCII 53 53.

Early systems permitted the user to input the characters being used in the sequence to be sorted by, but the standardization of alphabets and characters precludes such an approach as a viable alternative although it is possible to program the technique.

A simpler method is to perform two translations by table. The first translation converts the words in an input string character by character into a second string comprised of the standard uppercase alphabet (A-Z). Dipthongs are converted into separate single characters. The second translation converts the second string into sequential sort values.

In the first translation table, which is specific to the language being used, entries can contain two characters. The table can be constructed using the uppercase alphabet from A to Z followed by the dipthongs as two characters. For example, the double-s entry at E1 would contain 53 53 (S S). The 84 entry (umlaut-a) would contain 41 (A) and the 9A entry (umlaut-u) would contain 55
(U...