Method and System for Summarization of User-Contributed Data in Social Media using Graph Based Algorithm
Publication Date: 2014-Mar-11
The IP.com Prior Art Database
A method and system is disclosed for summarization of user-contributed data in social media using graph based algorithm.
Page 01 of 5
Method and System for Summarization of User - using Graph Based Algorithm
Disclosed is a method and system for summarization of user -contributed data in social media using graph based algorithm.
The method and system proposes a graph based algorithm that generates a summary of diverse social media content generated by users. The graph based algorithm summarizes content with minimum redundancy and maximum information available from social media by extracting sentences and assigning scores. Each score indicates
a high or low value of one or more sentences from a summarization point of view . Sentences are considered to be of high value if it satisfy properties such as , but not limited to, if it contains informative, relevant concepts about products such as for example, a cell phone, if it contains concepts that are not discussed by other sentences in the generated summary, and it contains concepts that have received highest emphasis of reviews from a community of users.
In accordance with the method and system, in a graph based model, a graph of sentences for each product P (defined as set of all products in a data set ) is constructed by considering each sentence as a node . Two nodes representing one edge, share certain amount of similarity that can be defined as a number of common meaningful terms or cosine similarity between the sentences representing the nodes . The amount of similarity is annotated as weights on the edges . Thereafter, the number of common meaningful terms such as a noun or an adjective is considered as a base for similarity measure. Further, a collection of sentences (from which to chose the sentence for summary) is represented as a graph G = (V, E, W), where the nodes v ∈
V represent sentences and the edges e ∈ E represent a non-zero similarity between them. An initial reward score ri, is associated with each node vi as the number of meaningful terms in each sentence. The weight wij ∈ W associated with an edge e = (vi, vj ) indicates how much of vj is known if we know vi. It is defined as the fraction of number of common terms between vi and vj to the number of terms in vj . We also associate a cost ci with each node vi. The summary selection has a limitation on how much text can be accommodated in the summary . If the summary size is specified in terms of number of sentences, then the cost of the each sentence should be taken as
1. On the other hand, if the size of the summary is specified in terms of number of
words, the cost ci of a sentence would taken as the number of words in it . The decision of including a node vi in the summary is represented by variable di , where di = 1 indicates that the node vi is included in the summary and di = 0 indicates that it is not included in the summary. After including node vi in the summary, the individual reward score of the neighbor nodes, vj, will be adjusted (discounted) in the following manner:
It implies that the reward score of each neighbor of a selec...