Browse Prior Art Database

Method to Improve Statistical Speech Recognition in Performance and Response Time

IP.com Disclosure Number: IPCOM000113457D
Original Publication Date: 1994-Aug-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 2 page(s) / 43K

Publishing Venue

IBM

Related People

Okochi, M: AUTHOR

Abstract

Disclosed is a technology whereby a speech recognition system based on a statistical method can improve the response time on a slow machine without degrading the recognition accuracy. The frame interval of speech analysis windows is made short in the training session to get enough statistics, and it is made long in the recognition session to make the process fast even on a slow processor.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 74% of the total text.

Method to Improve Statistical Speech Recognition in Performance and
Response Time

      Disclosed is a technology whereby a speech recognition system
based on a statistical method can improve the response time on a slow
machine without degrading the recognition accuracy.  The frame
interval of speech analysis windows is made short in the training
session to get enough statistics, and it is made long in the
recognition session to make the process fast even on a slow
processor.

      In the conventional speech recognition systems based on
statistical methods such as Hidden Markov Model (HMM) and Label
Histogram Method, a common frame interval (typically 10 ms) is used
in both the training and the recognition sessions.  A long frame
interval degrades the recognition accuracy due to the phase
difference of the analysis windows in training and recognition
sessions; a short frame interval is good for accuracy but makes its
process slow.  On a slow processor, the common frame interval causes
the dilemma between the response time and the accuracy.

      The proposed method solves the dilemma.  In the training
session, a short frame interval (e.g., 2 ms) will get enough
statistics and makes the possible phase difference of analysis
windows short (up to half of the interval; 1 ms in this case).  Only
the impact is that the response to each training utterance becomes
slower (e.g., from 0.3 sec to 1.5 sec) but can be within an
acceptable range in the training session...