Browse Prior Art Database

A System and Method for Using Multiple Language Models in Parallel in Automatic Speech Recognition to Service Cross-Domain Queries

IP.com Disclosure Number: IPCOM000243489D
Publication Date: 2015-Sep-24
Document File: 3 page(s) / 147K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed are a system and method to resolve language conflicts when an Automatic Speech Recognition system uses s specific language model mixed with a general English model. The system runs the speech through two or more language models in parallel and uses scoring to decide the optimal result or presents the user with a set of possible results from which to choose the desired word.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 01 of 3

A System and Method for Using Multiple Language Models in Parallel in Automatic Speech Recognition to Service Cross -

-Domain Queries

Domain Queries

Automatic Speech Recognition (ASR) commonly uses broad language models to recognize the majority of spoken English; however, specific domains require highly specific language models to recognize certain speech vocabularies . An example of a specific language model might be all the names of employees in a company . This model consists of names that cannot necessarily be found or recognized by a general language model.

These specific language models work well in isolation, but problems occur when terms from the specific language model must be mixed with general English . One option is to combine the two language models into one big language model . The issue is that the accuracy of the specific domain language model is reduced by the possible presence of similar-sounding words in the larger general model, which increases the error rate on these specific words.

The novel contribution is a system and method that runs the speech through two or more language models in parallel and uses scoring to decide the optimal result or presents the user with a set of possible results from which to choose the desired word . This is simpler than the current method, which combines a number of language models into a single large and complex language model.

The system compromises a mobile client that sends audio of a spoken phrase or sentence to an Arbiter process, which then passes the audio to several Automatic Speech Recognition engines and receives transcripts back. The Arbiter then processes the transcripts, identifies the best combination of transcription results , and returns the results to the user.

Figure 1: Components and overall process

1


Page 02 of 3

Figure 2: The Arbiter process

Referring to Figure 2, the Arbiter process, this example...