Browse Prior Art Database

Framework for Stream De-duplication using Biased Reservoir Sampling

IP.com Disclosure Number: IPCOM000216344D
Publication Date: 2012-Mar-31

Publishing Venue

The IP.com Prior Art Database

Abstract

This work demonstrates a novel Reservoir Sampling based Bloom Filter,(RSBF) data structure, based on the combined concepts of reservoir sampling and Bloom filters for approximate detection of duplicates in evolving data streams. It shows that RSBF offers the currently lowest False Negative Rate (FNR) and convergence rates, and are better than those of Stable Bloom Filter (SBF) while using the same memory. Using empirical analysis on varied datasets, it exhibits upto 2x improvement in FNR with better convergence rates as compared to SBF, while exhibiting comparable False Positive Rate (FPR).