Browse Prior Art Database

Improving Voice Recognition Accuracy

IP.com Disclosure Number: IPCOM000123095D
Original Publication Date: 1998-May-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 2 page(s) / 79K

Publishing Venue

IBM

Related People

Farrell, AT: AUTHOR

Abstract

When a user of a voice recognition system is anonymous, that is not known to the system when the recognition session is initiated, the system has no information on the user and so cannot employ a voice trained personalized vocabulary to aid recognition accuracy. Therefore, the anonymous user recognition session can use only non-personalized vocabularies. In this context a vocabulary is defined as all the necessary resources needed to perform voice recognition for a given set of utterances (words or phrases).

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Improving Voice Recognition Accuracy

   When a user of a voice recognition system is anonymous,
that is not known to the system when the recognition session is
initiated, the system has no information on the user and so cannot
employ a voice trained personalized vocabulary to aid recognition
accuracy.  Therefore, the anonymous user recognition session can use
only non-personalized vocabularies.  In this context a vocabulary is
defined as all the necessary resources needed to perform voice
recognition for a given set of utterances (words or phrases).

   The improved system described here uses non-personalised
vocabularies in the most effective way for the anonymous user.  The
system obtains a sample of the anonymous user's voice and analyses it
to produce the best match between the anonymous user and a vocabulary
from its own set of standard vocabularies.  The system requests the
anonymous user to speak a predetermined utterance, that is, for the
purposes of the sample, the system knows what the anonymous user is
going to say.  Analysis of the predetermined utterance produces the
best match vocabulary for the anonymous user for that specific
recognition session.

   The selection of the best predetermined utterance to
request from the anonymous user is largely determined by the type of
utterance that contains the most information about the user's voice
characteristics in the context of the particular application.

   Thus the predetermined utterance will vary according to
the particular scenario and factors unique to the recognition
session.  For example, if the session is required to recognize only
discrete digits, an utterance which contains the most information
about how the user speaks digits is most appropriate.

   In a live recognition session (for example over the phone)
it is not feasible to ask an anonymous user to create a voice
trained personalized vocabulary.  The system requires a single short
utterance which is then used to pseudo-train the system to make the
best use of its own resources, that is, to select the best match
vocabulary from its own vocabulary set.  The principle is
demonstrated in the EXAMPLE RECOGNITION SESSION.

   The system described allows recognition accuracy to be
improved for the anonymous user, since the system learns something of
the user from their voice sample.

   Since the best match vocabulary applies to a specific
recognition session, there are a number of variable factors that are
eliminated when compared to a fully voice trained system:
  (a) Variations in the user's voice between sessions
       are acceptable.  These may be caused by illness
       (eg colds and flu) loudness, or even the disposition of
       some users...