Browse Prior Art Database

Apparatus to provide multimodal interaction with non-multimodal Web applications Disclosure Number: IPCOM000013119D
Original Publication Date: 2003-Jun-13
Included in the Prior Art Database: 2003-Jun-13
Document File: 2 page(s) / 88K

Publishing Venue



Apparatus to provide multimodal interaction with non-multimodal Web applications.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 2

  Apparatus to provide multimodal interaction with non-multimodal Web applications

Handheld, In-Car and In-Home mobile devices plus wireless networks have created demand for web access from non-traditional computers. These mobile devices typically have sufficient processor and memory resources to provide web access but lack the traditional keyboard and mouse for surfing the web. The absence of a means for efficient input drives demand for speech technology to be used. Applications that support both visual and speech interaction are called multi-modal applications. Today, web applications can be created using visual interaction markup languages such as HTML, xHTML, or WML. Voice-only versions of these applications that can beaccessed with only a telephone can be created with a voice interaction markup language such as VoiceXML. A new markup language for creating multi-modal web applications was needed and is being developed by the Web standards committees. IBM has submitted a proposal named xHTML + Voice (X+V) that shows how to combine 3 existing standards ( xHTML, XMLEvents, and VoiceXML) to build multi-modal web applications. The lack of X+V applications inhibits the industry creation of mobile device browsers that support X+V. This invention provides a mechanism and apparatus to allow existing HTML and xHTML applications to be converted to X+V applications on the fly and thus expand the number of available applications for devices with X+V support.

A novel mechanism is invented to transcode HTML pages to X+V. This apparatus is shown in Figure 1.

1. Parse HTML into a DOM

2. Annotation step: Insert grammars into the DOM attached to certain elements (i.e., text entry input elements). The grammar references can be any of the ways grammars are referenced in VoiceXML documents. I presume this includes ways to specify how to create grammars from backend DBs, ...

You could also optionally insert annotation that shows alternative ways to render selected elements, if there are better ways to present them audially. Annotation allows you to have conditional includes and replaces -- Might there not be cases where what you see is not what you want to hear?

3. Transcoding step: ( If DOM is an HTML DOM, transcode it to XHTML -- this is a prereq, not part of the invention.) Create voice snippets to go with selected XHTML elements and insert them in the DOM. Some can be created automatically, given the labels and other information in the HTML. Oth...