Browse Prior Art Database

System and mechanism for updating the search engines with latest WWW pages

IP.com Disclosure Number: IPCOM000125342D
Original Publication Date: 2005-May-27
Included in the Prior Art Database: 2005-May-27
Document File: 3 page(s) / 40K

Publishing Venue

IBM

Abstract

Most of the Internet search engines are using the crawling mechanism to search billions of web pages on the World Wide Web (WWW). The crawling algorithm traverses the WWW directory and each of the web pages at every internet site on the directory, followed by all the web links on a page recursively. Due to the current internet search nature, the search engines crawling mechanism is too expensive for the following known reasons. 1. It keeps searching the World Wide Web sites repeatedly to get the uptodate information of the pages. This actually increases the internet traffic heavily. 2. It visits the same web sites again and again even though the web sites don't have any updates. Due to this, the commercial web servers are slowed down. 3. It might index the same web pages again and again in every crawling session even though those pages don’t have any updates. 4. The search results are not up to date or might be absolute because the crawling would miss the latest updates from the Web sites when there are updates on the site. 5. Inconsistent search results across the search engines because some engines get the latest and some do not.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 3

System and mechanism for updating the search engines with latest WWW pages

This invention proposes a new elegant search mechanism for the internet search and addresses the above issues, as well as provides the more accurate and uptodate information of the web sites and consistent search results across the search engines.

This invention derives a small add-on component which is designed to run on every web server with the web sites. The add-on component could be configured with the different search engine interfaces through which it could send the updated pages to the search engines whenever any changes happen on the web site. Basically, the add-on component will monitor the entire web site and notify the changes to the subscribed search engines whenever any new pages are added or any pages updated or any pages got deleted. This invention makes sure that every search engine will get the latest data available on the web site immediately. So any changes on the web site immediately reflect in all search engines. It uses a push mechanism to send the website updates to the search engines so the search engines don't have to request the web site to get all the pages. This invention addresses the above mentioned problems owned by the traditional crawling mechanism.

Novelties

1. It reduces the internet network traffic because the crawling is needed whenever there are needs, and the crawling might be invoked only for requested pages not for the entire Web.

2. It provides the consistent search results across the search engines because it pushes the latest data to every registered search engine as soon as the data is available on the web site.

3. The search results would be up to date.

4. It avoids visiting the commercial web sites which do not have the...