Browse Prior Art Database

Run-time Enhancement of Hyperlinks for Enabling Speech Assisted Web Browsing Disclosure Number: IPCOM000029046D
Original Publication Date: 2004-Jun-14
Included in the Prior Art Database: 2004-Jun-14
Document File: 10 page(s) / 55K

Publishing Venue



Disclosed is a method of enhancing the hyperlinks on a webpage to make them distinct from each other, so that if a person speaks the hypertext, rather than click on the hyperlink, the user will be re-directed to the intended URL. The hyperlinks are parsed from the web page content. The page structure is determined in a standard format, the context for each hyperlink is determined and hyperlinks are re-defined. The web-page with enhanced hyperlinks is displayed to the user. The user has an access to visual, graphical interface, however, no access is assumed to a mouse or a pointing device to select a hyperlink. The user may be able to see the hyperlinks, but not select any one of them without voice input. For a speech recognition system, corresponding grammars are dynamically generated for the enhanced hyperlinks. These grammars are used by a speech recognition system to specify the list of allowed sentences/phrases that a user can speak. The grammars generated are such that they cover all the hyper links in the web page. Additionally, additional browsing commands (such as go back, next page, scroll up, etc) are also handled by a parallel grammar that can be compiled and stored prior to the browsing session. This mechanism of generating grammars based on the content and structure of a web page leads to easier identification by the user and better speech recognition by the speech recognition system. The speech recognition system recognizes the user's utterances, and associates it with the closest match to one of the hyperlinks as represented by the grammar. The speech recognition system, then, re-directs its output to the browser, which loads the web-page referred to by the hyperlink.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 13% of the total text.

Page 1 of 10

Run-time Enhancement of Hyperlinks for Enabling Speech Assisted Web Browsing

1.0 Background

While keyboard and mouse are the most common interface between computers and its users, the search for a better and more natural interface has continued. At the same time, speech is one of the most natural ways of communicating. A number of researchers have attempted to use speech as a user interface for computers. While speech recognition engines exist for recording the user input and storing the recognized text to create documents, the problem becomes complex in a browsing environment. Connecting a speech recognition system to applications enables user to control the flow and execution of various applications. On the other hand, speech synthesis engine enables text to be read to users. The complexity of applications increases the complexity of their command and control using speech.

Worldwide Web has increased manifold and has become inherently complex. Voice enabling a web-site for voice requires understanding the structure of the web-site, to and from links from a page and also identifying different areas in the page as the context to which user input is directed to. The problem becomes more acute if the speech enablement is to be attempted for any web-site without prior knowledge of the sites that a user would visit. The main contribution of this invention is a method to determine the structure of a web-page, and its content on the fly at run-time and use it along with speech recognition to enable browsing of any random web-site.

For example, in a typical web-page of a news site (say,, there may be several newsitems listed with 1/2 line description, followed by hyperlinks "more", "Audio/Video" and "Mobile". Each hyperlink points to a separate URL. This web-site is not designed for voice input. If a user speaks the input "Audio", the speech enabled browsing system cannot decide which URL the user should be re-directed to.

W3 Consortium has been working on providing standard mechanisms to specify voice specific media descriptors to HTML4 for voice enabling the web. The aural style sheet feature in CSS is one such mechanism to specify the text-to-speech (TTS) parameters. A grammar attribute is provided to specify the grammar ["Voice Browsers", by Dave Raggett and Or Ben-Natan, located at "" , 1998]. Though such features can be used to build web-pages that would allow browsing through voice, these are not useful for the legacy web pages that have been authored using earlier version of the markup language like HTML 1.0 etc and it is unlikely that all new web-sites would use this standard.

2.0 Prior Art

The Internet is the most utilized information resource today. The number of Web pages on the Web exceeds 2 billions and this number is still increasing exponentially. The most common way of accessing these Web pages is by using a visual Web browser that runs on a client machine. A conventional We...