Original Publication Date: 1986-Dec-01
Included in the Prior Art Database: 2005-Mar-09
This article describes a method for estimating the number of different values in a list containing arbitrary replication. It requires a single pass through the list, and uses negligible storage and very little processing power. The method may therefore be used as a matter of course to collect statistics about any set of data as that set is created. Definitions - h is a function hashing values onto a 32-bit word. Thus h(v) is a pseudo-random 32-bit string, and if u=v, then h(u)=h(v). For a suitable hash function, refer to [*]. - L1 is a function that accepts a 32-bit string and returns the string with all bits save the original low-order '1' bit zeroed.