Browse Prior Art Database

Activity Analysis of Real World Entities by Combining Dynamic Information Sources and Real World Entities

IP.com Disclosure Number: IPCOM000014561D
Original Publication Date: 2000-May-01
Included in the Prior Art Database: 2003-Jun-19
Document File: 4 page(s) / 71K

Publishing Venue

IBM

Abstract

A program is disclosed that activities of real world entities are automatically analyzed by combining dynamic information sources and real world entities. Although a huge amount of information are available through networks (Internet/Intranet), it is sometimes necessary to take time to understand what is going on because what users want is different from what users get. What users get from Internet services, like search engines or directory services, is a set of documents (or fragments of documents), and what users want is some knowledge, which means some facts of real world entities. To fill a gap between these two, the following steps are applied. (1) Definitions of the Structure of Real World Entities First, entities in the real world are defined and arranged in the form of a taxonomy. (2) Definitions of the Structure of Entities of Information Sources Next, information sources are assigned for each real world entity. Information sources are a set of URLs which contain information about the assigned real world entities.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 4

  Activity Analysis of Real World Entities by Combining Dynamic Information Sources and Real World Entities

A program is disclosed that activities of real world entities are
automatically analyzed
by combining dynamic information sources and real world entities.

Although a huge amount of information are available through
networks (Internet/Intranet), it is sometimes necessary to take
time to understand what is going on because what users want is
different from what users get. What users get from Internet
services, like search engines or directory services, is a set of
documents (or fragments of documents), and what users want is some
knowledge, which means some facts of real world entities. To fill a
gap between these two, the following steps are applied.

(1) Definitions of the Structure of Real World Entities
First, entities in the real world are defined and arranged in the
form of a taxonomy.

(2) Definitions of the Structure of Entities of Information Sources
Next, information sources are assigned for each real world entity.
Information sources are a set of URLs which contain information
about the assigned real world entities.

Fig. 1 shows an example of this layered hierarchical structure in
the data model f
or real world entities and their information sources.

Taxonom y Layer

Inform ation Source Layer

Taxonom y for TOrganizationaxonom y for

Products

A site, which is assigned for a real world entity as a source of
information, is defined
as a set of URL. It may be collected manually like Internet
directory services, or some automatic collection methods are
applicable:
- URLs accessible with some limited number of clicks from the
specified top URL with some conditions, like

Site

URL

URL

Phrases,Sentences

Keyw ord

Fig 1. Layered H ierarchical Structure in D ata M odel

1

[This page contains 2 pictures or other non-text objects]

Page 2 of 4

    o appears in the same server to the top URL.
o appears only in subdirectories of the top URL.
- URLs obtained from Internet search engines
o a query for a search engine is specified for the site
definition.

  - Rejects URLs which are too popular, like portal site, by
defining a dictionary.

(3) Maintenance of Information

Registered URL in (2) are periodically crawled by the system showed
in Fig. 2[1].

M etadata D B

Similarity C alculation M odule

M etadata A ccess M ethod

Feature Collector

D B M S+V ersion C ontrol M echanism

Extraction

Taxonom y

Internet/Intranet

Fig 2. System

If the content for a URL is different from the older one which
crawled the last time (collector), then it is saved as the new
version for the URL (DBMS + Version Control Mechanism), and it will
be analyzed, and information elements, which are used for activity
analysis, will be extracted.

Basic information elements are anchors with their titles (anchors),
and consecutive characters (text blocks).

Text parts both for anchors and text blocks are analyzed into pairs
of keywords and their categories (attribute-value pairs) by using
natural language processing sys...