Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Producing Digitized Voice Segments

IP.com Disclosure Number: IPCOM000115597D
Original Publication Date: 1995-May-01
Included in the Prior Art Database: 2005-Mar-30
Document File: 2 page(s) / 92K

Publishing Venue

IBM

Related People

Bowater, RJ: AUTHOR [+2]

Abstract

Voice application programs for voice response units consist of: o Voice Segments - digitized stored representations of voice. o Prompts - programmed procedure which link voice segments together to form, for example, a complete spoken date, number or currency (to name just a few examples). o State Tables - the logic of a voice application.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 53% of the total text.

Producing Digitized Voice Segments

      Voice application programs for voice response units consist of:
  o  Voice Segments - digitized stored representations of voice.
  o  Prompts - programmed procedure which link voice segments
together
      to form, for example, a complete spoken date, number or
currency
      (to name just a few examples).
  o  State Tables - the logic of a voice application.

      Prompts and State Tables can be defined using either
conventional programming languages (such as 'C' or REXX), or by means
of a Graphical User Interface.  However, Voice Segments cannot be
defined in the same way and are typically created by converting
recorded speech into digitized voice segments.  This can be an
extremely laborious process for the following reasons:
  1.  An application may require a large number of voice segments
       (1000+ in some cases).
  2.  To order to achieve the highest possible speech quality with a
       minimum level of background noise, it is usual to record voice
       segments in a studio perhaps using a professional actor or
       actress.
  3.  Assuming that the voice segments are recorded in a continuous
       stream on for example, a Digital Audio Tape (DAT), each voice
       segment must be located on the tape and then selected for
import
       as a voice segment using some form of Voice Segment Editor.
  4.  Each segment may need to be processed (e.g., filtered or
       gain-adjusted, and a fixed amount of silence may need to be
       appended before and after each segment.

      The solution described here automates the entire process and
can reduce the time needed to import, say 100 segments, from hours
down to a few minutes.  The key to this is to record the complete set
of input segments as a continuous stream of audio with a delimiter
separating the segments.  This delimiter could be one of a number of
possibilities:
  1.  A tone (e.g., a telephone DTMF key)
  2.  A particular spoken utterance (e.g., the word 'next')
  3.  A period of silence

      The following example utilizes the last of these possibilities,
i.e. a period of silence between segments.  This silence period must
be long enough to allow silence gaps between segments to be
distinguished from natural silence gaps within segments.  A silence
period of five seconds between segments has been found to be more
than adequate.

      A utility is provided to allow automatic import of voice
segments.  It provides the following functions:
  1.  Record - The entire audio stream (including delimiters) is
       digitized and stored in a file as a continuous stream of
digital
       data.  This will be sampled...