Voice Interactions with Applications
Publication Date: 2014-Jul-31
The IP.com Prior Art Database
Techniques for voice interaction between applications on a mobile device and an off-device voice interaction platform are disclosed. A Voice Interaction API can be used to establish a trust relationship between the voice interaction platform (that a user interacts with through their voice) and an application operating on the mobile device. The application interacts with the voice interaction platform using an on-device voice interaction service. The voice interaction service acts as a proxy for the voice interaction platform. The application can then use the voice interaction service to enable a user to use his/her voice to issue commands to and make requests, such as confirmation requests, selection requests, and cancellation requests, of the application. The trust relationship can involve requiring differing levels of confirmation for different levels of risk associated with commands/requests to the application – relatively low-risk commands/requests can involve little or no user confirmation, while relatively low-risk commands/requests can involve one or more confirmations from the user.
Page 01 of 7
Voice Interactions with Applications
Increasing the coverage and depth of actions that users can perform, through their voice, to the point of parity with other input methods (e.g., touch screen, keyboard, mouse) is a desirable trait in mobile devices. Voice interactions, however, are somewhat different from a standard input method, in that the inputs are described at a semantic level (e.g., take a photo) rather that at a physical level (e.g., touch event at x0, y0 which corresponds to the camera button). The process of mapping what a user says to an intended action and how that action should be completed is a more complex and ambiguous process involving speech recognition, contextual clues, and query understanding.
Ideally, a user should interact with mobile devices (e.g., confirm actions, pick options, disambiguate applications to use for an action) using voice commands to the same extent as enabled by standard input methods. For example, the user should be able to perform complex multi-stage workflows (e.g., booking a table at a restaurant) completely through voice interactions with the device.
Mapping an utterance to a specific semantic action can be done by an off-device voice interaction system rather than on the mobile device. Trust relationships for off- device voice interactions can be more complicated than trust relationships for on-device input interactions; e.g., between a touch screen and an application. For voice interactions, a mobile-device application and the voice interaction system can establish a trust relationship for converting utterances (i.e., spoken natural language) to semantic actions. Additionally, an expressive system for describing these actions and a
Page 02 of 7
subsequent conversational follow-up can be used to specify a scope of a task to be completed.
The voice interaction platform can establish a trust relationship between a voice interaction service used for user interaction on the mobile device and on-device applications. For example, in Figure 1 an on-device voice interaction service can be designated. The voice interaction platform can rely on the on-device voice interaction service as a proxy for the user's voice. Also, on-device applications implicitly trust the voice interaction platform by using of the voice interaction service.
After the trust relationship is established, the on-device voice interaction service can trigger and complete new activities requested via utterance (e.g., "book me a cab for the airport tonight," "check into my flight," etc.) based on information and context (e.g., location, previous queries, etc.) of an utterer of the utterance.
Applications can then differentiate requests from the voice interaction service, from another application, and from the voice interaction platform. The applications can
Page 03 of 7
access activity/voice interactors that enable user requests conveyed via the voice interaction service. For example, an application can condu...