Browse Prior Art Database

Extraction of Regional Topics

IP.com Disclosure Number: IPCOM000118737D
Original Publication Date: 1997-Jun-01
Included in the Prior Art Database: 2005-Apr-01
Document File: 4 page(s) / 101K

Publishing Venue

IBM

Related People

Nomiyama, H: AUTHOR

Abstract

Disclosed is a device to extract regionally topical keywords with their topical regions from databases. 1. Configuration - The overview of the disclosed device is shown in the Figure. o Database: sets of information objects which includes bibliographic information (like dates, authors) and text information. o Keyword Index: index for keyword search engine. Keywords for information objects in databases are attached for each document manually or automatically by using text processing technologies. The disclosed device assumes keywords among which topological relationships can be defined (like place names) must be included.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 47% of the total text.

Extraction of Regional Topics

      Disclosed is a device to extract regionally topical keywords
with their topical regions from databases.
  1.  Configuration - The overview of the disclosed device is
       shown in the Figure.
      o  Database: sets of information objects which includes
          bibliographic information (like dates, authors) and
          text information.
      o  Keyword Index: index for keyword search engine.  Keywords
          for information objects in databases are attached for
          each document manually or automatically by using text
          processing technologies.
         The disclosed device assumes keywords among which
          topological relationships can be defined (like place
          names) must be included.
      o  Keyword Search Engine: process to find information
          objects which satisfy the specified queries.
      o  Regional Topics Extractor: a device to extract topical
        keywords.
  2.  Algorithm - This method includes the following two steps:
      a.  Extraction of regional topics
      b.  Visualization of extracted topics
          1) Extraction of Regional Topics
      We assume "regional topics" as follows:
      o  "Regional topics" are represented as keywords with their
          topical regions.
      o  Some kind of keywords represent regions for information
          objects, like place names (regional keywords).
      o  "Regional Topics" have close relationships with
          restricted regions (topical regions).
      o  Frequencies of keywords in topical regions decreases
          from the center of the topical region to the edges or
          the topical region.
      The process of the extraction of regional topics consists
       of the following two steps:
      a.  Calculation of degree of topicality
      b.  Decision of topical regions
          1) Calculation of degree of topicality
          First, the degree of topicality is defined as
           follows.  It is used to decide whether keywords
           are topical or not.
          Degree of Topicality = Freq(W/R) * (Freq(W/R)/Freq(W))
          where:
          Freq(W): number of information objects which includes
                    the keyword W
          Freq(W/R): number of information objects which includes
                      the keyword W and the regional keyword R
       To calculate this value, we define a set for regional
        keywords X.  For example, to get regional topics in Japan,
        the set is defined as follows:
         X={"Hokkaido",..,"Tokyo",..,"Osaka",..,"Okinawa"}
       Th...