InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Detect danger based on cognitive analysis of speech

IP.com Disclosure Number: IPCOM000243890D
Publication Date: 2015-Oct-26
Document File: 3 page(s) / 102K

Publishing Venue

The IP.com Prior Art Database


This article describes the use of a multi-factor cognitive system which performs enhanced processing of natural language by using additional cues such as verbal cues. The article primarily describes how to use this system in a danger detection scenario, but also describes how this system could be used in other domains.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 41% of the total text.

Page 01 of 3

Detect danger based on cognitive analysis of speech Problem statement
Today most emergency detection systems either

Require the person in need to take action like input a code, or make a phone call to emergency services, or push a button etc when they need help. Such system require the person being protected to take an action and if they are unable to do so, the emergency services are not contacted.

Require the person in need to wear a sensor based device that compares the audio signals acquired from a microphone with the audio signal model of the said person being protected. Such systems are often based on calculation of danger based on speech intervals of those to be protected and nothing else and do not account for situational context.

Solution overview

The accuracy and success rate of a detection system could be improved by creating a cognitive system that monitors the speech of the person being protected and calls the emergency services based on the words used, the inflection of the words, the pitch and tone of the words and score it based on the context of the conversation.

The advantage of pairing textual analysis with vocal analysis is that either is likely to provide lots of false-positives on their own, but in combination false positives are much less likely.

The system would overtime learn the typical speech patterns of the said person and use that to detect aberrant speech, which combined with textual analysis, gives an accurate emergency alert

Implementation details (emergency detection scenario)

The flow is as follows:

Step 1: Training

The system would listen to the speech patterns of the subject(s) for a set period of time that would be considered the basic learning period.

Text training

System is loaded with dictionary of home security issues (help, fire, intruder, gun)

System is loaded with phrases of past security triggers ("someone's in the house!") and false alarms ("Stay out of the cookie jar!")

Voice training

System is trained per speaker to learn voice (supporting art)

During training period, record speaker's volume, pitch/tone, and talking speed range and distribution

Ground Truth:

The system would be tested with example "emergency situation" speech patterns like ("someone's in the house!") and false alarms ("Stay out of the cookie jar!") would be acted out to teach the system what "emergency situation" could be. This would not be an all inclusive "emergency situation" set, just a way to seed the system.

Step 2: Daily use

Once the system is trained (based on the result of Ground Truth setup) it would be turned on to analyze new speech patterns and listen to the inflection of the words, the pitch and tone of the words and score it based on the context of the conversation. When the system hears speaker speak in a deviation from normal volume/pitch/speed, system engages in a textual analysis

Does the text contain typical home security keyword problems from the dictionary? By sentence similarity, is the sentenc...