Browse Prior Art Database

Method and System for Performing Guided Sampling of Streaming Linked Data

IP.com Disclosure Number: IPCOM000236136D
Publication Date: 2014-Apr-08
Document File: 3 page(s) / 61K

Publishing Venue

The IP.com Prior Art Database

Abstract

A method and system is disclosed for performing guided sampling of streaming linked data.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 47% of the total text.

Page 01 of 3

Method and System for Performing Guided Sampling of Streaming Linked Data

Disclosed is a method and system for performing guided sampling of streaming linked data.

The method and system samples a large, dynamic graph evolving according to a stream of nodes or edges, where a user is enabled to specify a finite time window to let nodes and edges expire after moving out of the time window . The method and system exploits side information from persistent relationships between the networked entities . The side information correspond to, but is not limited to, friendship and following relationship among social network users, common metadata such as same family plan , same institution among telecommunication users and dependencies among components of distributed computer systems . The persistent relationships are used to partition nodes or edges into a stratum in order to guarantee that each stratum is well represented in a sample graph.

The method and system provides an integrated solution for guided sampling of big , streaming graphs such that graph features of interest are preserved for a user 's application. The figure illustrates overall work flow of the method.

Figure

As shown in the figure, the ground truth graph at a point is a graph defined by the edges in the window as unexpired edges. The sampler maintains a sampled graph of the ground truth graph such that the sampled graph represents the ground truth graph in the features of interest. Additionally, the sampler conforms to a streaming constraint and memory constraint, wherein the sampler accesses one edge or a node at a time in an order the sampler streams without backtracking or looking ahead . The sampler is able to store only a limited fraction of the sampler streams as specified by a size of memory allocated to the sampler.

The method and system addresses a problem of deleting implicit edge or node due to a sliding time window in designing such sampler by utilizing stream sampling techniques. Such techniques propose memory efficient algorithms for randomly sampling from a sliding window on generic data streams. The algorithms generate a uniform set of

1


Page 02 of 3

samples from the sliding window such that each data element has equal probability of being sampled. A reason for generating the uniform set of samples is that uniform samples are useful for unbiased estimation of various properties in a value of data elements. Here, data elements are linked with each other to form an underlying graph thereby rendering uniform sampling insufficient for application needs .

The method and system addresses another problem of ensuring that the graph constructed from sampled nodes or edges represents the ground truth graph in the current window in features of interest to high level applications . The method and system utilizes guided stratified sampling that is configured to focus on certain parts of the ground truth graph while still probabilistically sampling the other parts. Her...