Browse Prior Art Database

Method and System for Automatically Predicting Success of an On-line Community based on Linguistic Analysis

IP.com Disclosure Number: IPCOM000240189D
Publication Date: 2015-Jan-11
Document File: 4 page(s) / 51K

Publishing Venue

The IP.com Prior Art Database

Abstract

A method and system is disclosed for automatically predicting success of an on-line community based on linguistic analysis.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 37% of the total text.

Page 01 of 4

Method and System for Automatically Predicting Success of an On-line Community based on Linguistic Analysis

Generally, success of a community is defined by an average number of satisfied members in the community. Currently, satisfaction of members in any community is analyzed using survey-based approach. However, the survey based approach is time-consuming and may not be scalable.

Disclosed is a method and system for automatically predicting success of an on-line community based on linguistic analysis. The method and system predicts satisfaction of members by analyzing words or languages used by the members in the on-line community or a workplace.

The method and system utilizes various steps for predicting the success of the on-line community. The steps include ground-truth construction, language-based feature analysis, statistical modeling and applying the statistical model.

In the ground-truth construction step, the system collects the ground-truth for the satisfaction of members through surveys or log data. Communities that satisfy certain thresholds can be selected for analysis. The thresholds can be based on one or more of, but not limited to, word count, number of posts, total number of views, total number of contributors and total number of members.

In the language-based feature analysis step, the method and system conducts a linguistic analysis for computing a language based features after constructing the ground-truth. The linguistic analysis is carried out for the community posts. The language based features can be first person plural words (e.g. "we, "our," "us"), anxiety words (e.g. "worried," "confused," "vulnerable"), leisure words (e.g. "blog," "book," "soccer") and assent words (e.g. "agree," "yes," "OK"). The computation of the language based features can be carried out using one or more of, but not limited to, a psycho-linguistic dictionary, an expert judgment, empirical analysis and optimization based approach. The psycho-linguistic dictionary can be used to compute an initial feature list of features from word-count frequency. An

example of the psycho-linguistic dictionary is Linguistic Inquiry and Word Count (LIWC) dictionary. Since the LIWC dictionary provides multiple language features, the linguistic analysis can be conducted using the LIWC dictionary.

Alternatively, the list of features can also be identified from a feature list which is statistically significant with the community success, where the significance level is defined by the expert. Optionally, list of features can be selected based on the expert judgment where the features are indicative of the community success. The list of features which are empirically determined to predict the community success in different threshold conditions can also be selected. Using all the lists identified above, the method and system identifies an optimal list of features to maximize a given success metric.

1


Page 02 of 4

The maximization can use a greedy-based fe...