Browse Prior Art Database

Method for mapping speech input as a response to an HTTP request Disclosure Number: IPCOM000202297D
Publication Date: 2010-Dec-13
Document File: 2 page(s) / 47K

Publishing Venue

The Prior Art Database


The core idea of the invention is to propose a method where voice input is integrated into a web application by means of an external device. Because browsers do not have access to devices plugged into the PC including a microphone, voice input is currently provided to the web applications through browser plugins or Java applets which have specific hardware and software requirements and can sometimes result in security issues. It will be useful if a web server hosting the web application can call the user on a phone number and start recording immediately without needing to install anything on the client machine. Depending on the web application, the voice input can then be stored as it is or translated to other languages or converted to text.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 2

Method for mapping speech input as a response to an HTTP request

The main idea is that the web application initiates an HTTP request to the server and passes the phone number as a parameter. The server initiates a call to the user. The server records the speech input and processes the input when the user ends the call. The processing can create an audio file or translate that audio file into a specific language or convert the voice into text (in a specific language). While the user is speaking and the voice input is being processed the initial HTTP request remains open. After processing, the created audio file or translated text are sent back to the browser as the response to the initial HTTP request.

Infrastructure needed:

    For the client side of the web application only a common browser is needed. On the server site of the application, dependent on the type of the application, a specific server type or specific server software is needed for initiating calls, recording speech, speech-to-text translation and translation of speech into different languages.

Steps needed for technical implementation:
1. HTTP request is initiated to the server and phone number of user passed as request parameter, e.g. XMLHttpRequest to the URL For instance, the phone number could be entered as part of the login additionally to the username and password, could be stored in the "Settings" part of the application, or the user could be prompted for the number whenever the service needs to be used, etc.

2. Server retrieves the phone number from the request parameter and calls the user. The initial HTTP request remains open during this step.

3. While the user is speaking the server records the speech input. The initial HTTP request remains open during this step.

4. User ends the call. The initial HTTP request remains open during this step.
5. Server processes the speech input dependent on the purpose of the HTTP request, e.g. creating (translated) audio file in specific format or converting speech to text (in specific language). The initial HTTP request remains open during this step.

6. Server sends processed data (text or audio file) as the response to the initial HTTP request and the request is closed.

Possible scenarios to integrate this method:
1. Integration in web applications to input large data with speech:
Integrate speech input into web applications that need text input. Instead of typing (longer) text, the user could just speak into the phone which is mapped to the web application with the phone number and the user's speech input could be simply used to automatically populate text fields, text areas, etc. Examples are:
a. Email web application where a user does not type, but just speaks into the phone and the text is automatically added to the email body. Text could be transl...