Browse Prior Art Database

Method for Controlling Time-Varying Playback of Audio Signals

IP.com Disclosure Number: IPCOM000114585D
Original Publication Date: 1995-Jan-01
Included in the Prior Art Database: 2005-Mar-29
Document File: 2 page(s) / 47K

Publishing Venue

IBM

Related People

De Gennaro, SV: AUTHOR [+3]

Abstract

Disclosed is a method of using the output of a speech recognizer to non-linearly vary the timescale of an audio signal during playback, allowing larger timescaling factors with reduced degradation.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 65% of the total text.

Method for Controlling Time-Varying Playback of Audio Signals

      Disclosed is a method of using the output of a speech
recognizer to non-linearly vary the timescale of an audio signal
during playback, allowing larger timescaling factors with reduced
degradation.

      There are often reasons for modifying the timescale of recorded
audio material during playback.  For example, audio playback is
generally serial; the user must either listen to, or skip over
sections through the equivalent of fast forward or reverse.  In
reviewing large sections of recorded material through playback, it is
advantageous to speed up the playback as much as possible while still
maintaining good intelligibility.  First order modifications can be
done through the removal of "silence" regions, and the linear mapping
of time.  However, linear timescale modification is not optimal: it
degrades some transitory speech sounds such as stop consonants more
than steady state sounds such as vowels.

      There may be other applications where it is advantageous to
slow down playback, for clarity of understanding particular sections.
Again, linear timescaling is not optimal.

      This disclosure presents a method of using the output of a
speech recognizer to non-linearly vary the timescale during playback,
allowing larger timescale variation with reduced degradation.

      In speech recognition systems, an alignment of time segments
against phonetic units is typically produced,...