Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Reduction of Acoustic Model Training Time and Required Data Passes via Stochastic Approaches to Maximum Likelihood and Discriminative Training

IP.com Disclosure Number: IPCOM000245273D
Publication Date: 2016-Feb-24
Document File: 4 page(s) / 38K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a stochastic modification of the traditional iterative training approach, which leads to the same or sometimes even better accuracy of acoustic models while reducing the cost of processing large data sets by requiring fewer passes through the training data. The goal is to optimize the model parameters just on a subset of available data while making the most of existing state-of-the-art training algorithms.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 36% of the total text.

Page 01 of 4

Reduction of Acoustic Model Training Time and Required Data Passes via Stochastic Approaches to Maximum Likelihood and Discriminative Training

Building acoustic models from large data sets benefits the accuracy of speech recognition systems. The recent boom in use of speech recognition technology makes the access to very large quantities (several thousands of hours of audio) of training data easier, but also constitutes a challenge in processing large and continuously growing amount of information.

As the amount of available training data has grown in recent years, the processing times for the traditional training approach have become prohibitively large. The need has shifted from developing algorithms capable of training acoustic models from a small available training set to exploiting large amounts of available data.

The traditional training procedure for a Gaussian Mixture Model (GMM) acoustic model encompasses finding optimal parameters of the GMM for each Hidden Markov Model (HMM) state. This can be done in several steps. The typical approach used in state-of-the-art acoustic model training first sets Maximum Likelihood (ML) as a training criterion and iteratively look for an optimal solution for each of the HMM states [1]. The training procedure continues with discriminative training (DT) in the feature space followed by DT in the model domain. In both cases, a Minimum Phone Error (MPE) objective function is used as a training criterion for larger training data sets [2]. Both the traditional ML and DT procedures require multiple passes through the entire training data set, which takes several days to complete.

The novel contribution is a stochastic modification of the traditional iterative training approach, which leads to the same or sometimes even better accuracy of acoustic models while reducing the cost of processing large data sets by requiring fewer passes through the training data. The goal is to optimize the model parameters just on a subset of available data while making the most of existing state-of-the-art training algorithms.

Existing solutions to accelerate training include variants of expectation-maximization (EM) algorithm such as incremental EM [3], which have been used to achieve faster convergence of ML training. Similar efforts have been done in DT, but mainly with Minimum Classification Error as an objective function, not the MPE referred to herein. The work focuses on better generalization of models [4, 5, 6], potentially allowing training on a subset of training data or on faster convergence of the training algorithms employing various online and batch probabilistic techniques [7, 8, 9].

The novel stochastic training approach follows a simple modification of the traditional training method. Rather than gathering statistics on an entire training data set and then carrying out model parameter update, the novel approach is to gather statistics on a randomly selected subset of the training data and update...