Detect danger based on cognitive analysis of speech
Publication Date: 2015-Oct-26
The IP.com Prior Art Database
This article describes the use of a multi-factor cognitive system which performs enhanced processing of natural language by using additional cues such as verbal cues. The article primarily describes how to use this system in a danger detection scenario, but also describes how this system could be used in other domains.
Page 01 of 3
Detect danger based on cognitive analysis of speech Problem statement
Today most emergency detection systems either
Require the person in need to take action like input a code, or make a phone call to emergency services, or push a button etc when they need help. Such system require the person being protected to take an action and if they are unable to do so, the emergency services are not contacted.
Require the person in need to wear a sensor based device that compares the audio signals acquired from a microphone with the audio signal model of the said person being protected. Such systems are often based on calculation of danger based on speech intervals of those to be protected and nothing else and do not account for situational context.
The accuracy and success rate of a detection system could be improved by creating a cognitive system that monitors the speech of the person being protected and calls the emergency services based on the words used, the inflection of the words, the pitch and tone of the words and score it based on the context of the conversation.
The advantage of pairing textual analysis with vocal analysis is that either is likely to provide lots of false-positives on their own, but in combination false positives are much less likely.
The system would overtime learn the typical speech patterns of the said person and use that to detect aberrant speech, which combined with textual analysis, gives an accurate emergency alert
Implementation details (emergency detection scenario)
The flow is as follows:
Step 1: Training
The system would listen to the speech patterns of the subject(s) for a set period of time that would be considered the basic learning period.
System is loaded with dictionary of home security issues (help, fire, intruder, gun)
System is loaded with phrases of past security triggers ("someone's in the house!") and false alarms ("Stay out of the cookie jar!")
System is trained per speaker to learn voice (supporting art)
During training period, record speaker's volume, pitch/tone, and talking speed range and distribution
The system would be tested with example "emergency situation" speech patterns like ("someone's in the house!") and false alarms ("Stay out of the cookie jar!") would be acted out to teach the system what "emergency situation" could be. This would not be an all inclusive "emergency situation" set, just a way to seed the system.
Step 2: Daily use
Once the system is trained (based on the result of Ground Truth setup) it would be turned on to analyze new speech patterns and listen to the inflection of the words, the pitch and tone of the words and score it based on the context of the conversation. When the system hears speaker speak in a deviation from normal volume/pitch/speed, system engages in a textual analysis
Does the text contain typical home security keyword problems from the dictionary? By sentence similarity, is the sentenc...