Browse Prior Art Database

Speech analyser

IP.com Disclosure Number: IPCOM000129855D
Original Publication Date: 2005-Oct-07
Included in the Prior Art Database: 2005-Oct-07
Document File: 2 page(s) / 42K

Publishing Venue

IBM

Abstract

This article describes a method of censoring inappropriate content from an audio presentation in such a way as to minimise disruption to the audio stream. When inappropriate content is identified in a spoken conversation, it is removed from the transmitted stream. Any silences caused by this removal of content are time-shifted to the end of the spoken sentence (or another suitable stage of the conversation) to hide the fact that any censoring took place.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Speech analyser

There are many situations where spoken language needs to be analysed and specific words from a lookup table need to either be replaced or removed from the speech. One example is on radio where bad language or, in the case of the BBC, product names need to be removed from the broadcast.

    Current solutions use voice recognition to pick out bad language and either replace it with beeps or silence. The drawbacks are that the listener can either tell that there was bad language through the beep, or the sound is broken up by silence The invention proposed will follow the idea of the current systems.

It will place a delay on the sound before it is transmitted

    It will analyse the sound using voice recognition to check for specific words Additionally it will

Continue to play the rest of the sentence after the specific word without a gap (reducing the delay)

Ensure there is no audible 'click' by matching up the sound waves before and after the removed word
Re-introduce the full delay at the end of the sentence.

Using the example of a radio station: All sound from the radio station is played through the analyser which will attempt to pick out specific unacceptable words using voice recognition technology. The analyser introduces a 10 second delay between when the presenter speaks and when that is transmitted

    Should a specific word be recognised by the analyser, the analyser will scan to the end of the specific word and then match up the waveform of the sentence before and after the word so that there is no audible click.

    This could be achieved by taking a point in the waveform just before the word, where the waveform has a zero-crossing. The waveform would then be connected to a point just after th...