Browse Prior Art Database

Client-Server Model for Speech Recognition

IP.com Disclosure Number: IPCOM000104048D
Original Publication Date: 1993-Mar-01
Included in the Prior Art Database: 2005-Mar-18
Document File: 2 page(s) / 93K

Publishing Venue

IBM

Related People

Daggett, G: AUTHOR [+4]

Abstract

Disclosed is a client-server method for the cost-effective implementation of speech recognition systems.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Client-Server Model for Speech Recognition

      Disclosed is a client-server method for the cost-effective
implementation of speech recognition systems.

      Large vocabulary Automatic Speech Recognition (ASR) is expected
to be a widely used input device for computer work-stations.  To
achieve the performance features required for productive use,
however, large vocabulary ASR systems will tend to be high-priced
relative to existing workstations in the near future.  The
Client-Server model of speech recognition can significantly lower the
effective cost of ASR for many users by sharing an ASR system among
multiple users.

      A  related  problem  is providing ASR capability to the
multiplicity of computer systems in use today and in the future.  A
common server can provide ASR capability to many different types of
computer systems, vastly reducing the number of different ASR designs
and implementations required.  Another related problem is the use of
speaker-specific ASR in multiple locations.  For example, a single
user may want to utilize ASR in many different offices or at many
different workstations.  Another problem is the short or long term
collection of speaker data which could be used to adapt a recognition
system for single speakers or groups of speakers.  A server can have
access to the voice and language information generated by many
speakers and combine this information to further adapt or customize
the ASR server for the user population.

      The Client-Server method is separated into components, some of
which can be shared by multiple users.  The major components are:

1.  ASR Server
2.  Client Application Programming Interface (API)
3.  Control Channel
4.  Audio Channel

      ASR Server - The ASR Server provides the speech recognition
capability which converts audio information into text.  Nominally,
this component is at a physical location remote from the ASR user,
and is independent of the hardware and software system used to
implement the user application.  This component also can provide all
ASR related functions such as enrollment of new users for speaker
dependent recognition algorithms, or for adaptation of ASR
characteristics to improve performance over time for individuals or
groups of users.

      Client Application Programming Interface (API) - The Client API
provides all of the ASR functions on the system locally available to
the ASR user.  Commands such as "enable microphone" and "receive
recognized word" are implemented by the Client API.  In effect, the
Client API provides the ASR capability to the user application.

   ...