Algorithm for multimodal emotion detection
Publication Date: 2012-Sep-26
The IP.com Prior Art Database
With the proliferation of mobile devices, e-commerce is set to play a pivotal role in our lives. With this comes an array of business models never feasible before. The web analytics of today is largely simplistic analysis of web page usage to improve usability of web page, strategic launch of new products online etc. The metrics and source of input is purely the web page. Some of the common metrics collected for analysis are Impressions Pages viewed Products viewed Complaints made Purchase made Registrations made etc. The future of web analytics is set to be driven by changing web technology trends and mobile device trends coupled with market needs like customer centricity, security etc. The developments in HTML along with the developments in mobile devices will dictate the next generation of Web analytics. Next generation of Web analytics should factor in the new features of mobile devices (camera, heart rate sensors, context enriched services etc.) and HTML technology developments (various new parameters related to audio & video content, image content, geolocation data, speech, biosensors). Web analytics in future would consider human emotions as one of the parameters. This disclosure details the how and what of including human emotions in web analytics
Page 01 of 4
Algorithm for multimodal emotion detection
Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology,computer science, linguistics, neuroscience, and related disciplines. For e.g.. Human brain processes the audio and video modalities, extracting complementary and robust information from them. Intelligent systems with audiovisual sensors should be capable of achieving similar goals. The audiovisual information fusion strategy is a key component in designing such systems. As an example, one can consider the fusion of head dynamics, gestures, and speech . This comprehensive fusion hierarchy, combining audiovisual cues at varying levels of abstractions to achieve a set of tasks together, is an important research direction and there is a need to develop a formal probabilistic framework to address the same. This is an important area for web analytics, for sentiment analysis for market campaigns etc.
In the form that web analytics works today, it does not consider human emotion, or human emotion transition to determine the effectiveness of a web page. It only checks number of clicks, number of pages etc. For Eg.: From a large number of clicks, it would not be possible to know whether there were many clicks because the page was interesting, or because it was confusing, or the user clicked just out of restlessness / anger. In this invention, we include cognitive intelligence to web analytics, so that the web page developer will know the emotion / reason behind the clicks etc.
There are 2 claims of invention in this disclosure
1. Multimodal analysis with time (i.e. considering different emotional cues in totality, along with the time), which helps detect emotion transition
2. Based on the detected emotion transition, a method to identify the effectiveness of the web page No known algorithm exists to detect the effectiveness of a web page based on emotion transition
This disclosure gives an algorithm that uses fuzzy logic to analyze multimodal cues to detect emotion. Most researchers choose decision-level fusion, in which the input coming from each modality is modeled independently, and these single-modal recognition results are combined in the end. Decision-level fusion, also called classifier fusion, is now an active area in the machine learning and pattern recognition fields. Many studies have demonstrated the advantage of classifier fusion over the individual classifiers due to the uncorrelated errors from different classifiers. Various classifier fusion methods (fixed rules and trained combiners) have been proposed in the literature, but optimal design methods for classifier fusion are still not available. In addition, since humans simultaneously employ the tightly coupled audio and visual modalities, the multimodal signals cannot be considered mutually independent and should not be combined only in the end, as in the case of decision-level fusion.
This method also considers time,...