Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

World Wide Web Search Architecture

IP.com Disclosure Number: IPCOM000016019D
Original Publication Date: 2002-Aug-15
Included in the Prior Art Database: 2003-Jun-21
Document File: 3 page(s) / 62K

Publishing Venue

IBM

Abstract

Problem Statement One of the most common methods to locate information on the World Wide Web is to use a search engine . [Other methods include those based on pre-organised and often human-maintained classification into catalogues or directories.] Search engines typically use a brute force web crawling technique; identifying web sites, traversing hyperlinks, retrieving web pages, and generating index meta data. However, the effectiveness of search engines is decreasing and will continue to decrease over time due to the following 2 key factors: The volume of information (new and modified) to be indexed is increasing rapidly this increases the delay between new web pages being made available on the web and their availability via search engines, increases network bandwidth requirements, and increases the storage and processing power requirements of search engines.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 48% of the total text.

Page 1 of 3

World Wide Web Search Architecture

Problem Statement

One of the most common methods to locate information on the World Wide Web is to use a search engine. [Other methods include those based on pre-organised and often human-maintained classification into catalogues or directories.] Search engines typically use a brute force web crawling technique; identifying web sites, traversing hyperlinks, retrieving web pages, and generating index meta data. However, the effectiveness of search engines is decreasing and will continue to decrease over time due to the following 2 key factors:

The volume of information (new and modified) to be indexed is increasing rapidly - this increases the

delay between new web pages being made available on the web and their availability via search engines, increases network bandwidth requirements, and increases the storage and processing power requirements of search engines.

Information is being increasingly made available through dynamically generated or non-textual web

pages that do not lend themselves to unintelligent hyperlink traversal, document retrieval, and textual index generation.

The current web search architecture is depicted in Figure 1. Note that web sites are N-fold, numbering in the millions.


1.


2.

Web Page

HyperlinkSpace

eb PageWTraversal

Retrieval

  Index Meta Data Generation

  Index Meta Data Repository

Query Engine

Search Engine

Web Site

Figure 1 - Current Web Search Architecture

Invention Statement

The invention is to reassign the responsibility for hyperlink traversal, web page retrieval, and index meta data generation within a site from the search engine to the respective web site.

In other words, rather than each search engine traverse links within a web site, retrieve pages, and generate index meta data, each web site operator would take responsibility for traversing their own web page space and generating index meta data. Index meta data would then be made available at the web site in an open standard format, for collection by search engines from a well known URL at the web site (eg. "http://hostname.domain/index.mdml" where MDML might be an acronym for Meta Data Markup Language).

The proposed web search architecture (subject of this invention disclosure) is depicted in Figure 2. Note that web sites are still N-fold, however a significant proportion of the work is distributed to be performed locally at the web site, and the interaction between web sites and search engines is dramatically simplified.

1

[This page contains 1 picture or other non-text object]

Page 2 of 3

Web Page

HyperlinkSpace

eb PageWTraversal

Retrieval

  Index Meta Data Generation

Web Site

  Index Meta Data

Retrieval

  Index Meta Data Repository

Query Engine

Search Engine

Figure 2 - Proposed Web Search Architecture

The scope of the invention disclosure is as follows:

Each web site is responsible for hyperlink traversal, web page retrieval, and index meta data

generation within its own web page space.

Index meta data is generat...