Browse Prior Art Database

Method and System for Computing Clustering Coefficient for Spam Detection and Recommending Friends in Social Networks

IP.com Disclosure Number: IPCOM000200636D
Publication Date: 2010-Oct-22
Document File: 3 page(s) / 130K

Publishing Venue

The IP.com Prior Art Database

Related People

Jake Hofman: INVENTOR [+3]

Abstract

A method and system is disclosed for computing clustering coefficient for large scale social networks. The clustering coefficient is computed using a MapReduce framework. The MapReduce framework filters and forms clusters for the data associated with the social networks.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Method and System for Computing Clustering Coefficient for Spam Detection and Recommending Friends in Social Networks

Abstract

A method and system is disclosed for computing clustering coefficient for large scale social networks.  The clustering coefficient is computed using a MapReduce framework.  The MapReduce framework filters and forms clusters for the data associated with the social networks.

Description

Disclosed is a method and system for computing clustering coefficient for large scale social networks.  The clustering coefficient measures number of friends associated with a person who knows each other.  The clustering coefficient may be used for one or more of, but not limited to, a spam detection and a friend recommendation.  For example, the clustering coefficient may be used to distinguish legitimate users from spammers in a social network.  Similarly, the clustering coefficient may be used to recommend friends or other entities to a user.

In order to compute the clustering coefficient, a MapReduce framework is used for a large dataset of a social network.  The MapReduce framework processes an input of the large dataset in parallel as the input is too large to be processed on a single machine. 

In order to process the input, number of triangles incident are counted on each node in a sparse graph.  Assuming the sparse graph G = (V, E) where, |V| = n, |E| = m, and m = O(n).  A simple pivot algorithm is used to compute the number of triangles incident on a node of the sparse graph.  Considering a node v, and its neighborhood set Γ(v) = {u1,…, uk}, it is determined that an edge (ui, uj) is present in the sparse graph for all pairs i and j.  Thereafter, a sequential algorithm  is transformed into parallel algorithms.  To transform the sequential algorithm, each node is assigned to a reducer.  This outputs all possible  edges in the first phase.  Further, it is determined that which of these edges are present in the second phase.  The sequential algorithms are applied if the degrees of the graph are uniformly small.

Thereafter, the triangles incident are filtered to get low and high degree nodes separately for processing.  To get the high degree nodes, all of the low degree nodes are removed.  In response to removing the low degree nodes, the pivot algorithm is used to count the number of triangles for the high degree nodes.

Assuming the input is a list of edges along with the degree of endpoint of the edge, the input consists of tuples of the form (u, du, v, dv), where du denotes the degree of node u.  Firstly, the edges are separated where at least one endpoint has low degree from the edges where both endpoints have high degree.  The edges are separated by marking the key of the high degree edge with an extra bit.  For the tuple, input and output is as follows...