Browse Prior Art Database

Local Saves of Complete User-Relevant HTML Content

IP.com Disclosure Number: IPCOM000012957D
Original Publication Date: 1999-Oct-01
Included in the Prior Art Database: 2003-Jun-11

Publishing Venue

IBM

Related People

Authors:
Carl Binding Francois Dolivo

Abstract

This technical disclosure describes adoption of the mechanisms described in [1] to enable Hypertext Markup Language (HTML) browsers to provide complete local saves of HTML content. When using the "Save As" command from the commonly deployed World-Wide Web (WWW) browsers, the HTML root source page is saved onto a file in the local machine's file system under an Uniform Resource Locator (URL) of the form "file:". However, embedded, relative URLs to data embedded by reference within the HTML source (for example an image file in GIF format) are not resolved and are not saved locally. As a consequence, when re-accessing the saved HTML root page, the browser cannot retrieve the embedded data since the absolute URL has changed: the embedded data cannot be retrieved by concatenating the locally saved HTML root page's local file path name with the relative path name of the GIF data because a) the data is not even available on the local file system, and b) there would be a name conflict between the root page's local file path name with an expected directory to contain image source data. For example, a relative URL of the form "/image.gif" becomes the absolute "file:/image.gif" instead of "http://www.some.host//image.gif". The agent mechanism described in [1] can evidently be incorporated with the “Save As” command of the browser. Instead of periodically monitoring Web based HTML pages, it is the user’s explicit action that triggers retrieval of a HTML page and the resolution of embedded, relative URLs by scanning the HTML document, retrieving embedded URLs and storing their content as well as re-labelling the embedded URL to point to the locally saved embedded content. The local save does not create a single file, but creates a directory in which the HTML root page is stored as well as all the embedded image data. Care must be taken to avoid naming conflicts between the HTML root file and the created directory. Reference