Browse Prior Art Database

System, Method and Architecture for utterance suffix recognition with limited computational resources

IP.com Disclosure Number: IPCOM000198119D
Publication Date: 2010-Jul-26
Document File: 6 page(s) / 419K

Publishing Venue

The IP.com Prior Art Database

Abstract

Method for recognition of utterance suffixes (part of the utterances) suitable for automatic speech recognition systems with low computational resources is proposed. Possible utterance suffixes are supposed to be specified in the common way (e.g. grammars) as for the common speech recognition systems.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 37% of the total text.

Page 1 of 6

System, Method and Architecture for utterance suffix recognition with limited computational resources

FIELD OF THE APPROACH:

Speech recognition method for defined set of utterance suffixes is proposed. The method allows recognizing only defined part of the utterance. Duration of utterance suffix does not need to be explicitly specified.

BACKGROUND:

Speech recognition systems have been intensively developed during several past decades. There are many commercial applications utilizing some method or system for speech recognition. Special voice recognition systems (VRS) are being developed for embedded devices. These devices are usually of limited computational resources such as memory and processor which makes reliable voice recognition challenging. Embedded voice recognition applications are typically used in the command and control manner. There are for instance automotive voice-based applications for phone dialing, control of air-conditioning or control of audio device, navigation management etc.

For some application it is desirable to decode only the specific portion of the given utterance (e.g. suffix). Suffix could be useful for navigation control during destination specification when the set of all possible destinations is too large to be decoded. Because of limited memory and some multi-pass mechanism has to be used. In such cases it is useful to decode only state and city of the fully specified destination (in US those are in the end) first to limit the set of possible destinations to manageable number. Fast and accurate decoder with the ability to decode only specific portion of the utterance is the key requirement for the commercial usability of the above mentioned multi-pass systems. Such decoding method is the subject of this proposal.

PRIOR ART:

Most of the commercial Automatic Speech Recognition systems (ASR) are based on the Hidden Markov Model (HMM) in combination with the decoder using Viterbi algorithm [1]. Viterbi search can be viewed as a breadth first search with dynamic programming. Instead of expanding the search paths along a tree, it merges multiple paths that leads to the same search state and keeps only the best path. The standard Viterbi search guarantees to find the global optimal path because it searches through the whole search space. The search space is supposed to cover the set of all possible utterances. Result of the decoding is the most likely phrase.

As mentioned above for some applications it is desired to search only in the subspace of the possible utterances. For instance one may want to find out only the state during the first pass when address is said in the following format: Number, Street name, City name and State name. When the state is decoded the search space could be significantly reduced for the consequent pass. There are only few approaches which deal with the topic of utterance subsequence recognition. Recent approach can be found in [2]. Two methods mentioned there provide similar...