Browse Prior Art Database

Method for Ensuring Future Occurrences in Streaming Sets

IP.com Disclosure Number: IPCOM000237244D
Publication Date: 2014-Jun-10
Document File: 6 page(s) / 121K

Publishing Venue

The IP.com Prior Art Database

Related People

Vidit Jain: INVENTOR [+2]

Abstract

A method is disclosed for ensuring future occurrences in streaming data. The method includes predicting elements that may appear with minimum pre-defined frequency.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Method for Ensuring Future Occurrences in Streaming Sets

Abstract

A method is disclosed for ensuring future occurrences in streaming data.  The method includes predicting elements that may appear with minimum pre-defined frequency.   

Description

Disclosed is a method for ensuring future occurrences in streaming data.  The method includes predicting elements that may appear with minimum pre-defined frequency.

In accordance with the method, a universal set  of n elements is considered.  When different sets  are observed at time , then such sets are referred to as streaming sets.  

A particular streaming set  is considered arriving at time  and a min-d-occur (,,) is used to select  elements for  at time .  It is ensured that each  appears at least  number of times in  with probability close to .

Fewer than  elements may be selected when criterion for probabilistic guarantee is not met.  The constant  depends on the requirements of the application domain, which may be understood through an illustrative example as follows.

Consider a stream of queries issued to a search engine that may be arriving at a rate of one million per second.  Selecting  close to  corresponds to making predictions about occurrences in next  milliseconds.  Choice of the parameter  depends on characteristics of stream of sets.  For the stream of queries, it is assumed that a set of  candidate are tracked, where popular queries (i.e., ) are covered.

Additionally, it is assumed that at least  elements in  appear  number of times in .  Probability of a successful prediction is high when  and . 

A randomized algorithm for selecting  elements (from ) that are likely to appear at least  number of times in the given future time interval is as shown below.


Thereafter, each of k elements {y_1,y_2,…,y_k} selected by the algorithm has a high probability of occurring at least  times in future.  The probability is greater than .

Lemma 1

The probability of getting tails for all coin tosses in  .

Proof.  The probability of getting tails for all elements is the product of probability of getting tails for each element which is less than the probability of getting tails for an element which is present in at least D sets.  This probability is the same as the probability that there are no heads among the last d flips for the workstation that is available for D steps, which at most

because

Lemma 2

Pr[]  1- O(1/n). 

Proof.  The probability of ( i.e. its complement) is calculated.  It is assumed that the probability of getting a head among the first (at most)  flips for each   d be . So

This is because the probability of heads is at most .  Let the probability of getting at most  heads be , i.e.,

Here [H] is the probability of getting i heads in all.  Additionally, the elements for which we heads appear in  ways are selected.  Now since the term  is less than 1 so the term is dropped.

Now only those elements are retained for which tails are o...