Browse Prior Art Database

Produce Full Book-Style Indexing for HTML Information

IP.com Disclosure Number: IPCOM000014958D
Original Publication Date: 2002-Oct-04
Included in the Prior Art Database: 2003-Jun-20
Document File: 2 page(s) / 59K

Publishing Venue

IBM

Abstract

A program is disclosed that produces full book-style indexes for information in HTML. The Hypertext Markup Language (HTML) provides no markup for generating an index. While more and more information is being presented on the Web and with products in HTML format (help, online documentation, manuals), the retrievable of the information is deficient compared to other formats (e.g. BookManager, WinHelp) due to this lack of index facility. Studies have shown that readers rely on Tables of Contents and indexes to retrieve information, even more than searches.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 2

Produce Full Book-Style Indexing for HTML Information

   A program is disclosed that produces full book-style indexes for information in HTML. The Hypertext Markup Language (HTML) provides no markup for generating an index. While more and more information is being presented on the Web and with products in HTML format (help, online documentation, manuals), the retrievable of the information is deficient compared to other formats (e.g. BookManager, WinHelp) due to this lack of index facility. Studies have shown that readers rely on Tables of Contents and indexes to retrieve information, even more than searches.

The disclosed program, known as the indexing tool indexgen, takes as input HTML files, either in the form of "zipped" files or hierarchical directory structures, and generates book-style (hierarchical) indexes for HTML information webs and for books created from those webs through transforms. It outputs one or more index files in HTML.

The input HTML files have embedded HTML comments of a certain form, that are recognized by the tool as index entries, and HTML anchors, which mark the appropriate spots in the file. The tool collects these index entries, sorts them alphabetically, and creates one or more HTML files as output. These files can be used directly as an index for the HTML information or transformed with the original input files to create a book index.

Index entries consist of groupings of HTML comments in a specific format and accompanying bookmarks (anchor tags). The format of each of these entries is specific to the tool:

Index entry

<!-- STRING i[h]n [attribute].text of index entries -->

Anchor

   <a name="IDXxxxx"></a> where items in square brackets indicate optional text. Several bits of this format are crucial to an index entry being recognized by the tool:

A properly formed HTML comment with

The predefined text string "STRING"

"i" followed by, optionally "h", and a number "1", "2", or "3".

If "h" is present, the index entry is a heading of the level indicated by the associated number and has no associated anchor. It appears in the index as an entry with no associated reference but with subentries below it. This allows the formation of complex, hierarchical indexing. If "h" is not present, the entry has an associated anchor. It can be a primary entry or a subentry. "attribute" is one of several accepted attributes as described below.

"." to separate the tag and the text

o

o

o

o

o A properly formed bookmark (anchor) tag with "IDX" as the first characters of the

1

Page 2 of 2

name. The other characters in the nam...