Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Compressing Sort Keys While Maintaining Sort Sequence

IP.com Disclosure Number: IPCOM000046281D
Original Publication Date: 1983-Jun-01
Included in the Prior Art Database: 2005-Feb-07
Document File: 4 page(s) / 16K

Publishing Venue

IBM

Related People

Hicks, DR: AUTHOR

Abstract

A frequently performed function in many data processors is to sort or arrange a number of data items (strings) in some sequence or structure so that the strings can be retrieved in sorted order or so that associative searches can be efficiently performed.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 44% of the total text.

Page 1 of 4

Compressing Sort Keys While Maintaining Sort Sequence

A frequently performed function in many data processors is to sort or arrange a number of data items (strings) in some sequence or structure so that the strings can be retrieved in sorted order or so that associative searches can be efficiently performed.

The following method permits such strings to be compressed to eliminate repeated characters while still maintaining the correct order for a sort of search operation.

Most techniques for compressing characters in a string affect the compressed string such that it cannot be readily used for sort or search operations. A typical technique is to represent a sequence of repeated characters within a compressed string as of one or more predefined trigger characters which identify the start of the sequence, followed by values representing the repeated character and the number of times that it is repeated. Since the first character of the compressed sequence is not the repeated character, a multiple occurrence of a character (which becomes a compressed sequence) will not collate properly relative to a single occurrence of the same character (which does not become a compressed sequence). In addition, two compressed sequences representing two different numbers of occurrences of the same repeated character may not collate properly relative to each other. This is because the proper location of the shorter repeated character sequence relative to the longer one depends on the value of the character immediately following the repeated characters in the shorter sequence relative to the value of the repeated characters.

These problems can be solved by a compression technique which has the following attributes:

1. A compressed sequence of repeated characters

always collates adjacent to single characters of the

same value.

2. A compressed sequence of N repeated characters

collates before a compressed longer sequence of the

same characters if the N+lst character (i.e., the first

character after the repeated characters) in the shorter

sequence is less than the repeated character.

Otherwise, the compressed sequence of N repeated

characters collates after the compressed longer

sequence.

One technique which has the above attributes is the following:

1. A sequence of two or more repeated characters is

represented in

compressed form by two occurrences of the repeated

character followed by a count of repetitions.

1

Page 2 of 4

2. The repetition count is a ones or twos complement

signed value. The absolute value is the count of the

number of repetitions of the repeated character (which

may be biased to take into account the fact that at

least two occurrences of the character are necessary to

cause a compressed sequence). The repetition count has

a positive sign if the first character after the

repeated characters has a value less than the repeated

characters (or if the string ends with the last

repeated character); otherwise, the repetition count

has a nega...