Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Encoding Variable Length Data to Collate as if Blank Padding is Used

IP.com Disclosure Number: IPCOM000113170D
Original Publication Date: 1994-Jul-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 4 page(s) / 136K

Publishing Venue

IBM

Related People

O'Brien, TR: AUTHOR [+3]

Abstract

Disclosed is a method of encoding variable length, multi-field data into keys such that they collate the same as if blank padding was used.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Encoding Variable Length Data to Collate as if Blank Padding is Used

      Disclosed is a method of encoding variable length, multi-field
data into keys such that they collate the same as if blank padding
was used.

      Described is a method of analyzing the data and inserting
special trigger characters to allow the longer fields with values
less than the blank to sort first, then the short fields, followed by
the longer fields with values more than the blank.  The rest of the
discussion assumes the EBCDIC code set with hex '40' being a blank.
This method could easily be extended to other code pages or ASCII.

      Another feature of this algorithm is trailing blanks are
removed from fields.  Since any number of trailing blanks are
considered equal to one (or none), trailing blanks are insignificant.
This is an advantage as it results in shorter keys, thus a more
compact index.

      Two methods are mentioned herein, first the one that is easier
to understand and a possible disadvantage; then another method which
is somewhat more complex, but overcomes the disadvantage of the
first.  Both methods involve deleting all trailing blanks from the
fields to be encoded.  This also produces a smaller index.

      METHOD ONE - The trigger character chosen as a field delimiter
should be the character used for the 'padding' character.  In the
case of padding with blanks, as in SAA, the trigger would then be a
blank.  If it is first assumed the data to be encoded contains no
blanks, using a blank as the field delimiter gives the desired result
Key decoding knows where field boundaries lie, and the first
character after the common text will decide which of the two fields
should come first.  However, the data to be encoded does contain
blanks.  The blanks should be marked to distinguish between which are
real blanks, and which are field delimiters.  The value to choose as
the marker which says 'Field delimiter' is the value in the middle of
the range - Hex'80' - so x'4080' is the field delimiter.  The value
to use for indicating a real blank depends on the value that follows
the blank, and in fact, any consecutive blanks.  The consecutive
blanks are compressed down to 1, and this blank is followed by a
biased count.  The method for choosing the value is described in
patent 4774657.  Suffice here to say that if the byte following the
string of blanks is is les...