Browse Prior Art Database

Smart Voice Control Framework

IP.com Disclosure Number: IPCOM000198631D
Publication Date: 2010-Aug-11
Document File: 9 page(s) / 242K

Publishing Venue

The IP.com Prior Art Database

Abstract

Voice technology is attractive for human-machine interaction, however application developer will take great effort to master to use. We provide a method for them to enable their voice application without knowing any knowledge of voice, at the same time they can use their familiar developing environment. This method achieve this by isolating the developing from void sdk environment, also providing an easy program interface for developers.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 23% of the total text.

Page 1 of 9

Smart Voice Control Framework

Voice technology has been always very attractive for human-machine interaction since 1960's. But, up-to-date, it is quite hard to develop voice-enabled applications for most application developers, why?

To make it easier to understand this invention

                          , we begin with the glossary for some terms Glossary of this invention:

Voice technology

: mainly includes automatic speech recognition (ASR) and Text-to-speech (

TTS

) products, which consist of signal processing, pattern recognition, stochastic

process and a lot of mathematic algorithms.

Voice Grammar

: word lists to specify what "text" can be recognized by ASR. It is for command-control, not dictation.

Voice SDK

       : voice software development toolkit provided by different companies for their proprietary voice products. Voice SDKs provide plenty of APIs which could access the core functions of voice products for ASR &

TTS supports.

Voice-enabled

Application: an application provided to end-users. Besides its basic functions, it also supports "voice" input

/output.

These applications could be

web-applications or stand-alone desktop software running on PCs.

developers who master voice technology and voice SDKs, and know clearly how to add voice supports on normal applications.

Application developers: developers who have enough experiences for normal application development, to achieve the basic functions of applications, but very likely, may not understand voice technology.

Pains:

1. Voice technology is very hard to be mastered by application developers.

There are so many knowledge regions,

Voice developers:

such as ASR models,

TTS models,

                                                                                              grammar definitions, audio card controlling, signal processing and so on, so that it is really a heavy headache for application developers to learn, let alone in short time.
2. Although Voice SDK is a common solution, application developers also need make great efforts to master Voice SDK in details besides mastering voice technology, let alone no Voice SDKs can support all applications, that is to say, he may has to master more than one Voice SDK.

3. Voice technology requires dynamic speech input and different grammar definitions according to different application targets. So, it can not be a unified solution for every application.

The voice-enabled application implementations are varying even with the same voice SDK.

4. Voice technology requires huge computing resource for real-time communication, so it is usually achieved by C

/C++ programming language on specific OS platforms.

As a result, voice SDK is limited to native runtime environments.

Then,

the voice-enabled applications developed based on voice SDK must be bound to specific

environments.

But this limitation maybe conflicts with the overall design of the applications, so, there is a barrier between application development environment and the one, which is supported by voice SDK.

As an example, some voice SDK can't be used in web application at all.

Current Solutio...