A system and method to improve sentiment analysis using socially-contributed dictionaries
Publication Date: 2016-Jul-05
The IP.com Prior Art Database
Due to the vertiginous evolution of spoken languages used on Internet, the accuracy of a Sentiment Analysis (SA) system can easily degradate over time if its knowledge base of terms is not updated. This article describes a sentiment analysis system and method that leverages social-contributed dictionaries to keep the knowledge base up to date for mining new idioms, neologisms, abbreviations, acronyms of terms. There are several of those dictionaries that are constantly kept up to date by real users, so they are a good source of information to keep an SA knowledge base up to date.
Page 01 of 4
A system and method to improve sentiment analysis using socially -contributed dictionaries
Sentiment Analysis is a technique to evaluate people opinions about topics of interest (e.g. products for sale, events like, individuals) on the social media (e.g. blogs).
Among those social media, those that post microblogs -short text messages are becoming a valuable source of those opinions, which, once analyzed for mining opinions, can give you actionable insights. For example, if you are the provider of a financial service and you discover that a person is complaining about your service on Twitter, you may contact him to place customer retaining actions and to avoid spreading bad words about your service and brand.
Several factors make the sentiment analysis of microblogs a tough task: they are short; they are not grammatically correct; they contain lots of abbrevviations; they contain idiams, slang and neologisms; capitalization of words is not respected for example "This is the next big Thing". As a matter of fact, Natural Language Processing (NLP) tools rely on capitalization to detect proper nouns and other entities like institutions for example.
Sentiment Analysis (SA) systems may use different approaches:
- NLP-based SA analyzes the language using natural language processing tools;
- Statistical machine learning-based SA uses classifiers (e.g. Bayes classifiers);
- Lexicon based SA uses a lexicon of opinion words, called opinion lexicon, which contains the set of at least positive and negative words.
Whatever method an SA system uses, its heart is a knowledge base trained on languages, and on argument domain (etc. sport, fashion).
One important aspect related to the trained knowledge base is the vertiginous evolution of the spoken languages used on the Internet, which makes difficult to keep it up to date for mining new idioms, neologisms, abbreviations, acronyms: you have to re-train your SA system and this is a long process, which may include a) creating (or extending) a training dataset with manually classified negative, positive, neutral text; b) extending the knowledge base with new idioms of the language; c) re-train your system on the training dataset.
If you do not, your sentiment analysis will lose accuracy over time.
The good news is that there are social-contributed dictionaries (SCD) that are continuously updated with new idioms, neologisms, acronyms, abbreviations written by people who use them in the real spoken and written language.
This article discloses a method and a system to continuously update the knowledge base of a SA system by leveraging social-contributed dictionaries.
The main idea is to utilize social-contributed dictionaries in an SA methodology or workflow. With the exisitng continuously updated social-contributed dictionaries (SCD) a user can for instance:
- add a definition of an idiom, a neologism, and acronym.
- comment on an existing definition and add a different one if they do not agr...