Browse Prior Art Database

Method for simultaneously using distributed speech recognition engines for grammars and dictation

IP.com Disclosure Number: IPCOM000029511D
Original Publication Date: 2004-Jul-02
Included in the Prior Art Database: 2004-Jul-02
Document File: 1 page(s) / 29K

Publishing Venue

IBM

Abstract

A distributed speech recognition system where the speech is streamed simultaneously to a finite state grammar ASR engine is built in to a device and a large vocabulary ASR engine running remotely on the network.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 1

Method for simultaneously using distributed speech recognition engines for grammars and dictation

Many small computing devices (PDAs, embedded processors, etc.) have the necessary resources to run a speech recognition engine that supports grammars. Such devices are capable of running a voice-enabled form-filling application, such as a multimodal browser application. But some fields naturally call for free-form text, rather than a fixed grammar. In that case, a dictation engine would be needed.

Many of these devices do not, however, have the necessary resources to run a dictation engine. But if a device has a built-in microphone and a network connection, it may be able to support dictation with the aid of a server. Such a device would collect audio and send it to a server that is running a dictation engine, and the server would then decode the audio into text and send the text back to the device.

This process would be triggered whenever the user selects a dictation field in the form-filling application. Anything the user says at that point would be decoded into text to fill in the dictation field. However, the user may wish to issue a command from some grammar, rather than filling in the selected dictation field. For example, the form may have a permanently-active grammar that includes "submit", or the underlying multimodal browser may have a permanently-active grammar that includes "help".

If the user issues one of these commands while in a dictation field, the dictation engine will decode it as text. Therefore, the command will end up as text in the dictation field, rather...