Browse Prior Art Database

Maximally Informative Reduction of the Dimension of Speech Parameters

IP.com Disclosure Number: IPCOM000039397D
Original Publication Date: 1987-Jun-01
Included in the Prior Art Database: 2005-Feb-01
Document File: 3 page(s) / 22K

Publishing Venue

IBM

Related People

Mercer, RL: AUTHOR [+3]

Abstract

A frame of (speech) information characterized by an M-dimensional vector x is replaced by a m-dimensional vector y -- where m is less than M -- by a transformation y = Ax. A, an mxM matrix, is selected to maximize the mutual information between the reduced vector y and a vector quantized label of y. In a speech recognition environment, an acoustic processor is to assign an integer value it = j to a vector yt if the jth prototype among K available prototypes is closest (by some defined measure) to the spectral vector yt . The present invention involves deriving vector yt of 20-30 dimensions from a vector xt of about 200 dimensions. In determining the matrix A, let I be a random variable whose values are the labels 1,2,.....,k, and let X be a random M-dimensional speech vector.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 3

Maximally Informative Reduction of the Dimension of Speech Parameters

A frame of (speech) information characterized by an

M-dimensional vector x is replaced by a m-dimensional vector y -- where m is less than M -- by a transformation y = Ax. A, an mxM matrix, is selected to maximize the mutual information between the reduced vector y and a vector quantized label of y. In a speech recognition environment, an acoustic processor is to assign an integer value it = j to a vector yt if the jth prototype among K available prototypes is closest (by some defined measure) to the spectral vector yt . The present invention involves deriving vector yt of 20-30 dimensions from a vector xt of about 200 dimensions. In determining the matrix A, let I be a random variable whose values are the labels 1,2,.....,k, and let X be a random M-dimensional speech
vector. The joint distribution of (I,X) is modelled to have a probability element pifi(x), where pi = Prob(i) and fi is a probability density function where I = i is a prototype index i. The densities fi are defined for the long vectors in M dimensional space. Let r = r(A) denote the rank of A. If the matrix has full rank,
i.e., r = m, then the short random vector Y = AX also has well-defined conditional densities gi(y) (for y given the label i) which are the prototypes in the low-dimensional space and which are obtained as appropriate integrals of the corresponding high-dimensional prototypes fi(x). In this case, the mutual information between I and Y, INFO(I;Y) are defined in the standard way. If r & m, then Y has a singular distribution on Rm (from which yt is selected), so that it has no density and one cannot define INFO(I;AX) as usual. Mutual information is defined in terms of any arbitrary submatrix Ar of maximal rank r as (1) INFO(I;Y X AX) = INFO(I;ArX). Having a definition of information INFO(I;AX) for A of any rank r, the present...