Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Approach and System to Enable Large Scale Social Analysis

IP.com Disclosure Number: IPCOM000184998D
Original Publication Date: 2009-Jul-07
Included in the Prior Art Database: 2009-Jul-07
Document File: 2 page(s) / 23K

Publishing Venue

IBM

Abstract

As one of the major components in the Web 2.0 wave, social network sites have gained lots of attention in recent years. Actually, social networks exist everywhere in our daily life. Besides those networks built through all kinds of social network sites, for example, in telecom domain, social network is constructed by the relationship of mobile phone users drawn from call or SMS log. Since social networks embody the relationships among real people and the amount of these groups are huge, it’s important and valuable to do business analysis on them, which we called social analysis. Taking the telecom social network as an example, analyzing results on it could be applied in many areas, like social filtering based campaign management, social recommendation, virus-like marketing and even mobile anti-spam. At the backend of social analysis is the technology of graph mining or graph data analysis, which is a hot topic and gains more and more awareness from real world applications. Like most other graph data analysis problems, analyzing social network is realized by extracting some features from the whole graph, or say, answer some queries on graph data. Taking the social network based anti-spam solution for instance; the relationships between sender/receiver and among receivers are investigated to get spamming probability. Theoretically speaking, it is a query on graph data about the relationship between 2 nodes, or a group of nodes. For example, given 2 nodes, we can query whether they are connected or not. Furthermore, we can query how many how many common friends exist between them. Answers to these queries are important characteristics which could be used to calculate the spamming probability. Facing the eager requirements for social analysis, especially the requirements like mobile anti-spam analysis, our invention aims to give a novel solution about large scale social analysis, answer queries like direct connectivity and spamming probability efficiently via effective graph data management/storage mechanism. The traditional graph storage is in two categories, one is disk (DB) based and another one is memory based. For DB based solution, as the absence of native graph database, the relationship usually stored in a table with schema where index key could be on each single node or both nodes as joint key. The drawback of this solution is the latency for query, since the data is usually huge for memory and disk-resident, I/O is always needed for this kind of operation. Scale out is a possible solution, but extra network communication overhead and investment is holding back this solution. For memory based solution, linked list and bitmap are two popular choices. Linked list is suffered from the waste of memory with pointer structure. Bitmap can save memory space compared with linked list. But since the graph is usually sparse, the memory utilization may be not efficient too. Compressed bitmap could reduce the size, but is not good for random query against the graph data, since decompression is necessary for query. Furthermore, both solutions did not provide good and mature answer for the scalability requirement. That’s why we need an additional solution to solve this problem.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Approach and System to Enable Large Scale Social Analysis

The core idea of our invention is to build a hybrid data structure to maintain the relationship/connection information in social networks. This data structure is based on dynamic partition-able table structure like Google BigTable and space-efficient randomized data structure like bloom filter (Figure 1).

Figure 1. The hybrid data structure used to maintain social network data.

As shown in above figure, social relationships starting from a node (for example, the phone call relationship starting from 13901234567) are stored in a single line of this " table". Within a line, each peer-to-peer relationship corresponds to a column and we choose to differentiate "from"and "to"relationships with 2 column families. Each column family could hold any number of relationships, which enables the structure to accommodate the most unsociable and the most socialable nodes.

Besides those columns directly allocated for relationships, a special column named "bloomfilter" is used in each column family to maintain a space-efficient summary of the relationships in the family. By the nature of bloom filters, this column will be concise and could be used calculate many interesting characteristics, such as checking the existence of relationships, the number of common friends, etc.

Social network analysis is the mapping and measuring of relationships and flows between people, groups, organizations, animals, computers or other information/knowledge processing entities. The nodes in the network are the people and groups while the links show relationships or flows between the nodes. A sociometric chart plots the structure of interpersonal relations in a group situation. A sociogram can be drawn on the basis of many different criteria: Social relations, channels of influence, lines of communication etc. In sociometric chart, cohesion analysis is the first concern of social network analysis. A very important query is whether there is a relation between two entities, especially in a sociogram with billions of entities.

For those who are familiar with Google BigTable or Apache Hadoop, it's easy for them to realize the data model of previous section. Taking telecom social network as an example, mobile user's phone number could be used as the row key and column name (except the special "bloom filter"column). The cells in the "table"could be used to store information like SMS content, call time, call length, and so on. Timestamps will be used to keep the history of SMS and calls.

As the size of data stored in the "table" grows, the "table" could be partitioned dynamically leveraging the natural order on mobile phone numbers. Thosewho are familiar with Google BigTalbe or Ap...