Framework for Stream De-duplication using Biased Reservoir Sampling
Publication Date: 2012-Mar-31
The IP.com Prior Art Database
This work demonstrates a novel Reservoir Sampling based Bloom Filter,(RSBF) data structure, based on the combined concepts of reservoir sampling and Bloom filters for approximate detection of duplicates in evolving data streams. It shows that RSBF offers the currently lowest False Negative Rate (FNR) and convergence rates, and are better than those of Stable Bloom Filter (SBF) while using the same memory. Using empirical analysis on varied datasets, it exhibits upto 2x improvement in FNR with better convergence rates as compared to SBF, while exhibiting comparable False Positive Rate (FPR).