Browse Prior Art Database

A System for Extracting and Analysing Tags and Sentiment Scores from Unstructured Text

IP.com Disclosure Number: IPCOM000205102D
Publication Date: 2011-Mar-15
Document File: 4 page(s) / 83K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a system for summarizing large amounts of user content by associating tags with sentiment. This allows the user to identify the breakdown of sentiment surrounding a tag, and also to identify tags associated with particular sentiments.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 35% of the total text.

Page 01 of 4

A System for Extracting and Analysing Tags and Sentiment Scores from Unstructured Text

Disclosed is a process providing a capability for reducing data to tags, and sentiment scores enabling a user to extract tags generating positive sentiment (for example, a specific feature of a new product getting positive reviews) and tags associated with negative sentiment to be addressed.

Using an embodiment of the disclosed process, depicted in the following figure, incoming information is tagged and a sentiment scoring is extracted. Type(s) and volume of sentiment determine scoring, which is configurable to a context of an institution. For example, an institution may consider negative sentiment associated with a budget more import than negative sentiment associated with an emotion, whereas governments expect a level of negative sentiment associated with a budget and thus weigh the negative sentiment less. Tags are associated with sentiment scoring enabling analysis of tags generating the most positive, or negative sentiment. In addition the system of the disclosed process navigates the tag hierarchy to sentiment analysis of a given tag or hierarchy. The tag hierarchy of the disclosed process is created automatically, although a user refine and modify the tag hierarchy. A visualization engine enables the user to visually explore tags and sentiment.

1


Page 02 of 4

(This page contains 00 pictures or other non-text object)

For a medium to large size organization volume of data online associated with a respective brand can be overwhelming. Traditional methods of employing people to read and respond to information produced by traditional media typically do not scale with the increase in user generated content and are expensive.

In a previous solution [1], a system analyzes and extracts opinions, or sentiment from a corpus of one or more text documents, through Parts of Speech (POS) tagging. The process suggests extraction of documents associated with a specific topic has occurred. The disclosed process, by contrast, is more fine grained and scales to a larger corpus through additional use of tagging. The previous solution may work well for a specific query, such as "suitability of PRODUCT X as a gift for a 5 year old"; however the disclosed process is effective at analyzing sentiment associated with results, for example, for PRODUCT X and drilling down to detail associating

p

ositive sentiment with 5 year olds, but negative sentiment with 7 year olds. Efficacy of the

2


Page 03 of 4

p

available now on the web.

In another previous solution [2] a system for associating topics with sentiment in subsections of text documents in a corpus,

                  roposes a user define a topic. This solution represents a minimal incremental improvement over previous solution [1], merely reducing the corpus. By contrast, corpus reduction is not necessary in a system using the disclosed process, which extracts topics as part of the system, rather than requiring a user to know wh...