Browse Prior Art Database

Statistical Measurement of Phonological Similari

IP.com Disclosure Number: IPCOM000124576D
Original Publication Date: 2005-Apr-28
Included in the Prior Art Database: 2005-Apr-28
Document File: 6 page(s) / 143K

Publishing Venue

Motorola

Related People

Chen Liu: AUTHOR [+2]

Abstract

We proposed two phonological distance metrics, namely the monophoneme distribution distance and biphoneme distribution distance. The two metrics are objective measures defined in a statistical way. They characterize phonological similarity information between languages. The specific phonological information includes typological, phonotactic information, among others. An example application of the metrics is to help pick closest phoneme matches for cross-language transfer, building acoustic models without native training data.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 15% of the total text.

Statistical Measurement of Phonological Similarity

Chen Liu

,  Lynette Melnar

Abstract

We proposed two phonological distance metrics, namely the monophoneme distribution distance and biphoneme distribution distance. The two metrics are objective measures defined in a statistical way. They characterize phonological similarity information between languages. The specific phonological information includes typological, phonotactic information, among others. An example application of the metrics is to help pick closest phoneme matches for cross-language transfer, building acoustic models without native training data.

Introduction

It is well known to linguists that phones sharing the same IPA symbols are different across languages. This is because of the limit in number of symbols that IPA uses. The drawback can be taken care of by utilizing phonetic features to represent a phoneme in detail [Melnar and Liu, 2005]. However, there are still other facets of a phoneme that cannot be completely characterized by the phonetic features. For example, an actual realization of a phoneme in a language is influenced by the inventory or presence of other phonemes in the same language. An instance of a phoneme is also affected by its neighbor phonemes in the same utterances. All these behaviors are the subjects of phonology.

Study of phonological similarity has been used mostly for linguistic investigation such as comparison of inventories or a group of phonemes, e.g., vowels, stops, etc., across languages [Ladefoged and Maddison, 1996; Greenberg, 1978]. However, phonological similarity has recently been found useful by researchers in other areas. For example, we, as speech technologists, find needs for objective measurement of phonological similarity, to be used for phoneme mapping and cross-language transfer [Liu and Melnar, 2005]. The latter has a great business value for porting voice-enabled devices to a new language market where the native speech data is unavailable for model training. For this purpose, we developed two phonological distance metrics, namely the monophoneme distribution distance and biphoneme distribution distance.

Terminology

Phones

A narrow Maipa phoneme [Melnar, 2002] is denoted by , where l=1,…,L, is the language the phone belongs to, and  i=1,…,Il, represents the index of the phoneme in the language l. That is, the phoneme inventory of language l is

                                                            .                                                                                   (1)

Categories

Three typical ways of categorization are considered. In the first categorization strategy, all the phones are classified into one group, or a single global category, i.e., the number of categories under this strategy is G=1. The category can be represented by a list of all the member phonemes...