Browse Prior Art Database

Audio Playback Method using Contextual Keyword and Knowledge Indices Derived from Natural Language Processing Disclosure Number: IPCOM000246325D
Publication Date: 2016-May-30
Document File: 5 page(s) / 72K

Publishing Venue

The Prior Art Database


Disclosed is a method and system of playback of portions of audio based on keywords and other forms of extracted knowledge such as concepts and tone and sentiment.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 50% of the total text.

Page 01 of 5

Audio Playback Method using Contextual Keyword and Knowledge Indices Derived from Natural Language Processing

In order to replay a sentence or set of sentences from an audio recording in the context of one or more keyword , current methods require the user to skip forward or backward through the entire content, or start with segments of the content, and then listen to the audio until finding the desired segment of audio from the recording containing the keyword . Identifying concepts, tone, and sentiment from parts of the audio recording is an even greater challenge. Further, for playback, the language of the keywords, concepts, or sentiments need to be the same as that of the recorded media .

This system builds on existing bodies of work in areas of machine learning and NLP such as speech to text conversion , language translation, keyword and concept extraction, sentiment analysis, and other NLP methods. Keyword based searches for recorded audio and video media is not new and in many cases is driven by a title, descriptive text, or annotated chapters for the recorded media and potentially other types of meta-data such as genre. Playback of media by chapter and other manually annotated methods is not new.

The novel contribution is a method and system that utilizes natural language processing (NLP) methods with generated contextual indices to derive desired keyword(s) or other forms of rich information from an audio recording and then replay the content (i.e., sentences or set of sentences) that contains said keywords for the user.

The method first converts audio to text using existing speech-to-text technology. During the speech-to-text conversion, the system identifies, time-stamps, and labels sentences. After the entire audio stream is converted to text, the system sends it through a series of natural language post-processors. These post processors use existing natural language, speech-to-text, and sentiment analysis technology to generate keyword, concept, and sentiment indices that reference the labels from the transcribed text . These indices make it possible to navigate to the content of interest and facilitate audio playback around a defined interval where the keyword, concept, or sentiment was found.

The core novelty of this system is three-fold:

An index generation method that points to sentence boundaries in the transcribed audio that enables search and contextual playback. This makes it possible to navigate to the content of interest based on keyword, content, and sentiment without the user skipping forward or backing through the audio while listening to it.

The ability to extend the playback beyond keywords to other forms information such concepts, sentiment, and tone.

The ability to apply this to keywords, concepts, sentiment, and tone in a language that is not restricted to the original audio



Page 02 of 5

Following are the components and process for implementing the method and system in a pref...