Browse Prior Art Database

Selection Mechanism for Informative Data Sources Based on Their Histories

IP.com Disclosure Number: IPCOM000013997D
Original Publication Date: 1999-Nov-01
Included in the Prior Art Database: 2003-Jun-19
Document File: 3 page(s) / 66K

Publishing Venue

IBM

Related People

Hiroshi Nomiyama: AUTHOR [+2]

Abstract

A program is disclosed that informative data sources on the Internet/Intranet can be extracted by observing their changes.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 3

Selection Mechanism for Informative Data Sources Based on Their Histories

A program is disclosed that informative data sources on the Internet/Intranet can be
extracted by observing their changes.

Although a huge amount of information are accessible from networks(Internet/Intranet),
it is often difficult to find out valuable information from them. The disclosed program
focuses on information extraction from data sources which continuously provides information,
such as home pages of new paper companies, magazines, computer manufactures, or active
Intranet information sources.

The configuration of the system is shown in Fig 1.

Metadata DB

Mesurement Calculation M odule

M etadata A ccess M ethod

D B M S+V ersion C ontrol M echanism

Feature Collector

Extraction

Taxonom y

Internet/Intranet

Fig. 1 System C onfiguration

Collector : Collect registered URLs from networks.
Feature Extraction : Analyze contents of URLs, and extract pairs of keywords and their features
from them (ex. IBM-Organization).

DBMS+Version Control Mechanism : Collect the contents of the specified URL according to
the pre-defined schedule by using the collector. If the contents for the URL
are different from its previously saved version, then the system create a
new version for it and extract features and other information acquired
from Web servers, such as LASTMODIFIEDDATE for a new version.

                  And this mechanism can keeps only versions in the limited duration, such as
the last three month, so old versions are automatically deleted from the DB.
Metadata DB : Keep information on URLs themselves(such as URL addresses, titles)
and the all of their versions (such as version ids, LASTMODIFIEDDATE,
saved dates by this system, contents, results of the feature extraction)
Metadata Access Method : Provide API to access a metadata DB.

Mesurements Calculation Module : calculate mesurements for nodes and URLs for a taxonomy.

This mechanism includes:
(1) To measure importance of data sources by observing changes of a URL from the following
two viewpoints
Time Coverage

Duration in which a URL matches the specified query in the specified duration.
The more time coverage, the more important a URL...