Browse Prior Art Database

Text message word construction in voice applications

IP.com Disclosure Number: IPCOM000114549D
Original Publication Date: 2005-Mar-29
Included in the Prior Art Database: 2005-Mar-29
Document File: 2 page(s) / 42K

Publishing Venue

IBM

Abstract

This publication defines an idea for a voice application which will enable a user to enter a response to a prompt by the application as a string composed using their phone keypad, rather than as a voice reply. The voice application would accept this response as it would a normal voice response, and the user would not have to specify which of the input modes they were using before they entered the response. The application would be especially useful in situations where voice recognition may not work perfectly, such as on mobile phones.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Text message word construction in voice applications

This publication addresses the problem encountered when interacting with voice recognition software, particularly from a mobile phone, when the software is unable to distinguish properly what is said. Common response in such a situation is to ask the user to repeat the word (for example, if the user is trying to buy cinema tickets and the recognition software gets the town wrong, then the user is frequently asked to repeat the town).

    The current known solution to this problem is to repeat the word either until it is recognised, or until a certain number of attempts have been made, at which point the caller is passed onto an operator. This typically involves queuing, and does not lead to a responsive user experience.

    This publication would solve this problem by offering an unambiguous way of interacting with voice recognition software using Dual Tone Multi-Frequency (DTMF) tones.

    The key inventive idea behind this publication is that an application enabled for voice recognition should alternatively be able to accept DTMF tones representing a string data entry, and be able to use nearest match algorithms to decipher the user entry against a limited database of possible answers.

    When the caller is asked to enter voice data to be interpreted by the voice application, the user can input data using their phone keypad in the same way SMS messages are constructed. Thus, once the voice recognition software has failed the first time to recognise the word "Southampton", the caller could write the word in on their phone, as though they were writing it as a normal SMS message. The application would interpret the subsequent DTMF tones sent to it using predictive text, and subsequently treat this input as its answer rather than the voice response expected.

The advantages over other known solutions are:

    - Quicker response than having to repeatedly give unrecognised responses. - Customers who have experience with predictive text may prefer this method of data entry, as it will speed up their interaction with the application.

    - Possible lower overhead on machines running voice application when compared to speech recognition software.

    - Allows increased accessibility to voice applications to parties who may otherwise find them difficult or impossible to use.

    Initially the customer dials the number for the voice application, and is connected as usual. If they are prompted to enter non-numerical data via speaking to the application, the customer can instead enter his response using the keypad. The application should be able to accept and recognise that alphanumeric data can be entered either by speech or by DTMF tones, and the customer should not have to make any indication of which he is going t...