Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Automatic Discovery of Missing HTML Documents

IP.com Disclosure Number: IPCOM000021181D
Original Publication Date: 2003-Dec-31
Included in the Prior Art Database: 2003-Dec-31
Document File: 1 page(s) / 39K

Publishing Venue

IBM

Abstract

Everyone who has browsed the Internet has encountered a broken link resulting in an HTTP 404 - Not Found error. Unfortunately, when a document is missing due to a broken link, there are not many currently implemented recovery solutions. Internet Explorer will search the Internet via the user's indicated favorite search engine if the DNS server is unable to resolve the string provided in the address bar as a domain name, for example. This does not address, however, a situation where a user follows a broken link from within a domain. Upon encountering an HTTP 404 - Not Found error, a web browser can be empowered with functionality to attempt to locate the documents within domains as well. Web servers can also be empowered with mechanisms to assist the user in this regard.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 100% of the total text.

Page 1 of 1

Automatic Discovery of Missing HTML Documents

     A solution that involves web browser enhancement would be that upon receiving an HTTP 404 - Not Found error from the web server, the web browser can perform limited "web bot" functionality in an attempt to scour other pages for proper links to the document. If the web browser were to start, for example, from the root of the domain, or from successive parent directories and crawl through linked pages, the precise document might be located if other pages point to the same page in a different location on the server. Further, the web browser can assemble links to documents that have similar file names as it searches and propose them to the user as potential "hits" as discovered. This would assist in locating documents that cannot be found due to typos in the link.

     A solution that involves a server-based approach would involve the web-server performing this same searching algorithm when it is about to send an HTTP 404 - Not Found error back to the user. Since every server cannot be relied upon to implement this, however, it would be advantageous if the functionality were built into the browser so that the ability to recover from 404 errors is potentially present for any website visited.

1