Browse Prior Art Database

Produce Full Book-Style Indexing for HTML Information

IP.com Disclosure Number: IPCOM000014958D
Original Publication Date: 2002-Oct-04
Included in the Prior Art Database: 2003-Jun-20

Publishing Venue

IBM

Abstract

A program is disclosed that produces full book-style indexes for information in HTML. The Hypertext Markup Language (HTML) provides no markup for generating an index. While more and more information is being presented on the Web and with products in HTML format (help, online documentation, manuals), the retrievable of the information is deficient compared to other formats (e.g. BookManager, WinHelp) due to this lack of index facility. Studies have shown that readers rely on Tables of Contents and indexes to retrieve information, even more than searches. The disclosed program, known as the indexing tool indexgen, takes as input HTML files, either in the form of "zipped" files or hierarchical directory structures, and generates book-style (hierarchical) indexes for HTML information webs and for books created from those webs through transforms. It outputs one or more index files in HTML. The input HTML files have embedded HTML comments of a certain form, that are recognized by the tool as index entries, and HTML anchors, which mark the appropriate spots in the file. The tool collects these index entries, sorts them alphabetically, and creates one or more HTML files as output. These files can be used directly as an index for the HTML information or transformed with the original input files to create a book index. Index entries consist of groupings of HTML comments in a specific format and accompanying bookmarks (anchor tags). The format of each of these entries is specific to the tool: