Browse Prior Art Database

Analyzing Data Distribution By Sampling Hash Buckets

IP.com Disclosure Number: IPCOM000237650D
Publication Date: 2014-Jul-01
Document File: 2 page(s) / 63K

Publishing Venue

The IP.com Prior Art Database

Abstract

A method is disclosed for analyzing hash based uniform distribution on a database system. The method includes sampling a subset of total number of hash buckets for analysis.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 2

Analyzing Data Distribution By Sampling Hash Buckets

Disclosed is a method and system for analyzing data distribution, such as on a database system. The method includes sampling a subset of total number of hash buckets for analysis.

Traditionally, large scale databases include hash chunks that can be in millions.

Analyzing such a large number of hash chunks consumes resources and affects Online Transaction Processing (OLTP) traffic.

In an embodiment of the present invention, a user selects one or more data structures for analysis using hash based algorithms. The method includes selecting a subset number of hash buckets out of a total number of hash buckets. For example, if there are 1 million hash buckets containing data, then a subset of the million hash buckets are analyzed. The subset may include a maximum of 500 hash buckets. The subset of hash buckets is then analyzed and statistics regarding distribution of data in the hash buckets is provided.

The statistics include calculating average number of records in each hash bucket. This assists in calculating standard deviation. Thereafter, actual standard deviation is calculated from subset of hash buckets using the following:

SquareRoot (Sum (actual number of hash entries - average number of hash entries) ^2

/ Sample Size)

This provides sampled standard deviation of distribution of data. The actual standard deviation is compared with random uniform distribution of data in the hash buckets by calculating the fol...