Browse Prior Art Database

Speech Recognition Method Using Multiple Fenemic Baseforms of HMM

IP.com Disclosure Number: IPCOM000040386D
Original Publication Date: 1987-Nov-01
Included in the Prior Art Database: 2005-Feb-02
Document File: 2 page(s) / 84K

Publishing Venue

IBM

Related People

Nishimura, M: AUTHOR [+2]

Abstract

This article describes modified speech recognition of a fenemic baseform type, which can reflect context-dependent speech fluctuation at each frame in a word with a small increase of computation time. The basic idea of this approach is to prepare multiple time-aligned fenemic baseforms and use them in a combined way at decoding time. For combining the multiple baseforms, one of the following two methods is applied at each frame. 1) Use of averaged parameters of corresponding fenemic phone machines. 2) Use of the maximum probability for the observed label. The system of the proposed method consists of a training session and a decoding session. In the training session, as shown in Fig. 1, multiple baseforms are constructed from utterances for each word.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 86% of the total text.

Page 1 of 2

Speech Recognition Method Using Multiple Fenemic Baseforms of HMM

This article describes modified speech recognition of a fenemic baseform type, which can reflect context-dependent speech fluctuation at each frame in a word with a small increase of computation time. The basic idea of this approach is to prepare multiple time-aligned fenemic baseforms and use them in a combined way at decoding time. For combining the multiple baseforms, one of the following two methods is applied at each frame. 1) Use of averaged parameters of corresponding fenemic phone machines.

2) Use of the maximum probability for the observed

label. The system of the proposed method consists of a training session and a decoding session. In the training session, as shown in Fig. 1, multiple baseforms are constructed from utterances for each word. One of them is selected as the central baseform, and the remaining baseforms are time-aligned against it. HMM (Hidden Markov Model) parameters are, for example, calculated through applying a forward/backward algorithm to the central baseform using the remaining baseforms' utterances as training data. In the decoding session, as shown in Fig. 2, the time-aligned multiple baseforms are used in parallel. In Fig. 2, Pr out (k, Sj, Ot) is the probability of outputting a label Ot at a state sj of a baseform k, and Pr trn (k, Sj, Si) is the probability of the transition from a state Si to a state Sj . The probabilities Pr out (k, Sj, Ot) are averaged for all...