Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Two Pass Array Sort: Sorting Large Arrays with a Large Number of Duplicate Entries

IP.com Disclosure Number: IPCOM000103982D
Original Publication Date: 1993-Feb-01
Included in the Prior Art Database: 2005-Mar-18
Document File: 1 page(s) / 26K

Publishing Venue

IBM

Related People

Burnett, TL: AUTHOR

Abstract

A quicksort is generally considered one of the best sorting algorithms: with an average number of swaps, 0.69n*lg(n) + O(n); and an average number of comparisons, 1.39n*lg(n) + O(n). However, in this case, with a larger number of duplicate entries, quicksorts can not obtain mid-points to adequately split the (sub)array(s). This results in extremely poor performance for the quicksort. The two pass sort for 'n' elements and where 'm' is the largest element, requires three arrays ORDER[n], COUNT[m] and INDEX[m]. This c be a tremendous amount of memory, but the payoffs for speed are equally tremendous.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 100% of the total text.

Two Pass Array Sort: Sorting Large Arrays with a Large Number of Duplicate Entries

      A quicksort is generally considered one of the best sorting
algorithms: with an average number of swaps, 0.69n*lg(n) + O(n); and
an average number of comparisons, 1.39n*lg(n) + O(n).  However, in
this case, with a larger number of duplicate entries, quicksorts can
not obtain mid-points to adequately split the (sub)array(s).  This
results in extremely poor performance for the quicksort.
     The two pass sort for 'n' elements and where 'm' is the largest
element, requires three arrays ORDER[n], COUNT[m]  and INDEX[m].
This c be a tremendous amount of memory, but the payoffs for speed
are equally tremendous.

Remember: Set ORDER, COUNT and INDEX to zero;
STEP 1: Pass one
   i = 0;

   For all i in the original array, ORIGINAL[n]:
     COUNT[ORIGINAL[i]]++;

STEP 2:
   j = 0;
   i = 0;

   For all nonzero COUNT[i]  in COUNT[m]:
     INDEX[i]  = j;
     j += COUNT[i];

STEP 3: Pass two
   i = 0;

   For all i in the original array: ORIGINAL[n]:
     ORDER[INDEX[ORIGINAL[i]]]=
ORIGINAL[i];
     INDEX[ORIGINAL[i]]++;

Done.

The order of operation is 2*O(n) + O(m).  Much faster than the
quicksort, at the expense of memory.

Disclosed Anonymously.