Browse Prior Art Database

Automatic Selection of Context-Dependent Phonetic Units for Automatic Speech Recognition

IP.com Disclosure Number: IPCOM000036372D
Original Publication Date: 1989-Sep-01
Included in the Prior Art Database: 2005-Jan-29
Document File: 2 page(s) / 14K

Publishing Venue

IBM

Related People

Ferretti, M: AUTHOR

Abstract

This article describes an algorithm, which starting from a phonetic alphabet built by keeping into account basic phonetic knowledge, is able to provide automatically a larger and more accurate phonetic alphabet. The new phonetic units are selected analyzing the decoding process and chosen to improve the decoder performances. Background

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Automatic Selection of Context-Dependent Phonetic Units for Automatic Speech Recognition

This article describes an algorithm, which starting from a phonetic alphabet built by keeping into account basic phonetic knowledge, is able to provide automatically a larger and more accurate phonetic alphabet. The new phonetic units are selected analyzing the decoding process and chosen to improve the decoder performances. Background

Systems that use Markov word models for automatic speech recognition usually employ two kinds of models: fenemic and phonetic. Fenemic word models, called fenemic baseforms, are built automatically in two different ways, either using multiple utterances of the word, or using a technique to synthesize them from utterances of different words. Phonetic word models, called phonetic baseforms, are built by concatenating the phonetic units which form the phonetic transcription of the word. The set of phonetic units is called phonetic alphabet. The phonetic alphabet must describe the basic sounds of the language to be modeled. Its definition is a key factor for the performances of a speech recognition system. The alphabet must contain an accurate description of the language acoustic but must also be not too large to allow the user to perform acoustic training of the system providing a small sample of the user's voice.

The usual way to define a phonetic alphabet is to employ phonetic knowledge to select the acoustic events which are worth modeling by a specific unit. By following this procedure there is no way to be sure that the set of phonetic units obtained is optimal for the decoder performances. Algorithm

It is assumed that a previous version of the phonetic alphabet already exists and that all the words of the decoder vocabulary have been transcribed using this alphabet. The existence of a large set of data for the decoder performance analysis is also assumed. These data consist of words belonging to the decoder vocabulary uttered by a set of speakers. These data are supposed already to be signal processed and labelled both with speaker-dependent and speaker- independent prototypes. Data labelled with speaker-independent prototypes are time- aligned to the phonetic baseforms and for each phonetic unit all the aligned strings of labels are stored with information about the phonetic context.

The algorithm is the following: a. Perform acoustic match between the acoustic labels of each utterance of word W and the phonetic model for W using phonetic speaker-dependent stati...