Browse Prior Art Database

Flavoured Acoustic Models, an aproach to Automatic Speech Recognition in asymmetric multilingual context

IP.com Disclosure Number: IPCOM000127544D
Original Publication Date: 2005-Aug-31
Included in the Prior Art Database: 2005-Aug-31
Document File: 2 page(s) / 80K

Publishing Venue

IBM

Abstract

The problem addressed is Automatic Speech Recognition (ASR) in an asymmetric multilingual context: in such a cultural environment, speakers of one language (hereafter called main language) are often using words of another language (influencing language), and pronouncing them with near-native accent. A common example is the French Language as spoken in Canada (Fr_CA), which is strongly influenced by English. We describe here a new approach, Flavoured Acoustic Models, allowing suitable acoustic modelling aiming at better ASR accuracy in this context.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Flavoured Acoustic Models, an aproach to Automatic Speech Recognition in asymmetric multilingual context

Flavoured Acoustic Model Overview

The Flavoured Acoustic Model (FAM) approach involves the
following steps :

1. Enhancing phonology with extra phones to cover the
influencing language specific phones
2. Bootstrapping influencing language phones training with
foreign data
3. Boosting the weight of utterances where influencing
language words/phrases are being pronounced with near-native
accent by legitimate speakers.
4. Tailoring the vocabulary to offer influencing language
phonetization only in relevant cases.

The result of this process is an acoustic model associated
with a vocabulary (phonetic transcriptions/baseforms)

Detailed process description

STEP 1 : enhancing phonology with extra phones to cover the
influencing language specific phones

The FAM approach requires the use of a phonology (phone
alphabet) that describes the main language as well as the
influencing language. Enhancing the main language phone set in
order to support influencing language baseforms typically
involves overlaps, as shown in figure 1.

Let us define

Class M: phones exclusively used in the Main language
Class I: phones exclusively used in the Influencing
language
Class C: Common phones (used in both languages)

    Figure 1 : example of phonology common to French and
English

STEP 2: Bootstrapping influencing language phones with foreign
data.

We need to produce a first acoustic model covering the
influencing language phones and to have the common phones
correctly trained in the phonetic context tree.

We achieve this goal by including in our training data some
foreign data, close enough to the influencing language. The

1

[This page contains 1 picture or other non-text object]

Page 2 of 2

amount of data has to be well balanced compared to the rest of
the training data, in order to avoid corrupting the common
phones when used by a main language word.

The specificity of FAM approach is that t...