Browse Prior Art Database

Information Retrieval and Presentation Apparatus with Version Control

IP.com Disclosure Number: IPCOM000122965D
Original Publication Date: 1998-Jan-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 2 page(s) / 99K

Publishing Venue

IBM

Related People

Amano, T: AUTHOR [+4]

Abstract

Disclosed is a system for maintaining versions of information sources and for supporting temporal information retrieval and visualization of the information sources. Information sources could be a World Wide Web (WWW) page, a channel (in the sense of webcasting and push technology), or an output of an Internet search engine for a specified query.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Information Retrieval and Presentation Apparatus with Version Control

      Disclosed is a system for maintaining versions of information
sources and for supporting temporal information retrieval and
visualization of the information sources.  Information sources could
be a World Wide Web (WWW) page, a channel (in the sense of webcasting
and push technology), or an output of an Internet search engine for a
specified query.

The disclosed system consists of three components:
  o  Version controller
  o  Version-based information extractor
  o  Client manager

      The version controller retrieves and stores snapshots
(versions) of information sources at a predefined update frequency
(e.g., daily, every N days, weekly).  The maximum number, K, of
stored versions and the depth, D, of links for traversing referenced
objects can also be specified for each information source.  The
version control  mechanism can be one of existing full-version
mechanisms, such as difference calculation and update sequences, but
partial information extraction method, where specified segments
(e.g., title and headers, or  HTML anchors <a> ... </a>) are only
extracted, can be used to maintain  essential information of
versions.  This method may not be able to recover complete versions,
but can drastically reduce the required memory  space for storing
versions.

Irrelevant information, such as JAVA applets and style sheet
specifications, can also be omitted from the versions.

      The updates (and, therefore, versions) between the specified
sampling intervals may be totally ignored.  A trigger for storing a
version can also be a user's explicit operation of browsing a WWW
page.  By coupling a timestamp T (or a version number) with each URL
U, a WWW browser (or a server and a proxy) can store a new version
whenever a user accesses the URL (at timestamp T+) whose contents
have been updated since T.  Even though the contents may have not
been updated, the version controller can store the information that
"the URL U unchanged at timestamp T+" for finer version control.

      The version-based information extractor calculates the
following information from a series of versions of information
sources:
  o  Data items included in multiple versions of an information
      source
  o  Data items included in only one version of an information
      source
  o  Keywords (or phrases) that appear in one or more versions
      of an information source
  o  Keywords (or phrases) that appear in multiple information
      sources
  o  Keywords (or phrases) that appear in only one information
      source

      A data item could be <item>...</item> or <a>...</a> fillers,
text in image captions, contiguous paragraphs, etc.  Given a query or
a search profile, the information extractor...