Browse Prior Art Database

Automated Linguistic Knowledge-Based Procedure for Cross-Language Construction of Acoustic Models for a Language without Native Training Data

IP.com Disclosure Number: IPCOM000124579D
Original Publication Date: 2005-Apr-28
Included in the Prior Art Database: 2005-Apr-28
Document File: 6 page(s) / 211K

Publishing Venue

Motorola

Related People

Chen Liu: AUTHOR [+2]

Abstract

We developed an automated linguistic knowledge-based procedure that constructs a set of acoustic models (HMMs) for a new language without using native training data. The candidate models come from native, well-trained models from a set of different languages. The procedure includes a series of processes employing phonetic and phonological analyses that ultimately find the best candidates for each target phoneme. A series of phonetic and phonological distance metrics that we developed are fundamental to the process automation and proved to be effective. Especially a combined phonetic-phonological (CPP) metric is shown to provide the highest performance. Not any native data is assumed available but our results matched the best performance of a set of reference models obtained through a data-driven method in which the native data is used. In other words, our knowledge-based method has reached near its upper bound in model selection.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 12% of the total text.

Automated Linguistic Knowledge-Based Procedure for Cross-Language Construction of Acoustic Models for a Language without Native Training Data

Chen Liu

, Lynette Melnar

Abstract

We developed an automated linguistic knowledge-based procedure that constructs a set of acoustic models (HMMs) for a new language without using native training data. The candidate models come from native, well-trained models from a set of different languages. The procedure includes a series of processes employing phonetic and phonological analyses that ultimately find the best candidates for each target phoneme. A series of phonetic and phonological distance metrics that we developed are fundamental to the process automation and proved to be effective. Especially a combined phonetic-phonological (CPP) metric is shown to provide the highest performance. Not any native data is assumed available but our results matched the best performance of a set of reference models obtained through a data-driven method in which the native data is used. In other words, our knowledge-based method has reached near its upper bound in model selection.

Introduction

Several years of effort in the speech industry have been invested in the goal of building acoustic models without using native training data for a target language. Various linguistic knowledge-based approaches have been proposed to find for each phoneme in a target language the closest acoustic models from the existing languages. However, no satisfactory results have been reported. Most of the knowledge-based methods to date still remain at a very primitive and crude level. Specifically, the similarity of phonemes across languages largely relies on whether they share the same transcription symbols (Frasca et al, 2004), such as those of the IPA (IPA, 1993), SAMPA (Wells, 1989), and Worldbet (Hieronymus, 1993).  Because phoneme labels do not provide precise phonetic descriptions, this type of approach is limited to providing a seed model for bootstrapping. No other more sophisticated knowledge-based methods have been published.

A more sophisticated linguistic knowledge-based approach is by employing an experienced linguist who hand-maps each phoneme in the target language to phonemes in the source languages. No native data are needed in the process. Expertise in phonetics, phonology as well as relevant areas is needed, which is usually costly. This work becomes increasingly challenging and inaccurate as the number of both source and target languages increases. The outcome is limited by the experience and knowledge of the linguist in each language, and normally lower than the data-driven method.

Researchers then relaxed the previous rigid constraints on the availability of target-language data by assuming that a small amount was available at the development stage. It is noted that some researchers classify the adaptation approaches into data-driven methods for cross-language transfer. But in this work we are only concerned about the...