Browse Prior Art Database

Optimized Method and Process for passing bloom filters to disparate systems

IP.com Disclosure Number: IPCOM000243726D
Publication Date: 2015-Oct-15
Document File: 2 page(s) / 59K

Publishing Venue

The IP.com Prior Art Database

Abstract

This article describes a simple mechanism for client software to query multiple server endpoint for data based on bloom filters as defined at the endpoint. The endpoint bloom filters are not required to have the same definition (size and hashes). The expensive calculation of the initial hashes are only performed once.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 54% of the total text.

Page 01 of 2

Optimized Method and Process for passing bloom filters to disparate systems

When bloom filters are used to perform filtering, specific parameters are used to construct a bit array that is used for comparison. Different applications/components may use differing parameters based on a number of factors. If multiple applications/components (called an endpoint here) are used as part of a solution there is no standard, convenient method to pass the bloom filter information to each endpoint such that the same data are used to interrogate the bloom filters at each endpoint. The solution disclosed solves the problem by decomposing the bloom filter construction and passing the necessary information to allow the endpoints to quickly build the appropriate bloom filter.

    This solution builds on several recent discoveries in large scale bloom filter implementations.

    The highest cost operation in building a bloom filter is the hashing calculation. This solution applies the hashing once and using the techniques in Kirsh and Mitzenmacher, "Less Hashing, Same Performance: Building a Better Bloom Filter" (http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/esa06.pdf) which

allows endpoints to construct a bloom filter matching the endpoint's parameters.

    The calling application determines the number of items to be included in the bloom filter and generates 2 hash values for each item. These values are sent to the endpoint which uses the bloom filter parameters for the endpoint bloom filter collection to construct a bloom filter appropriate to the endpoint. The endpoint collection is then searched for the requested value and the result returned.

    The innovative step here is the decomposition of the bloom filter such that it can be quickly reconstructed in different configurations (bloom filters with different parameters).

    Consider two systems: A calling application that has a set of properties to look for; and an endpoint application that checks for data inclusion in a bloom filter or collection of bloom filters.

    This example uses the murmur hash but any bloom filter quality hash may be used.

    Bloom filters are defined by four (4) interrelated parameters. p the probability of false positives, k the number of hash functions, n the number of items in the filter, and m the size of the filter in bits. There are mathematical equations that relate these such that only 2 are required to be known. The endpoint has these parameters.

    The calling application generates two (2) hash values for each property. If the murmur hash is used it will generate 2 values from a single calculation.

    The calling application places all the hash values in a data structure such as an array to be sent to the endpoint.

    At this point we have broken the bloom filter construction in half. The remaining part of the construction requires the parameters. The calling application sends the hash values to the endpoint such that the endpoint can d...