Browse Prior Art Database

Extraction of Dependency Structures from a Huge Amount of Japanese Texts

IP.com Disclosure Number: IPCOM000114083D
Original Publication Date: 1994-Nov-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 4 page(s) / 74K

Publishing Venue

IBM

Related People

Nomiyama, H: AUTHOR

Abstract

Disclosed is a device to extract partial dependency structures from results of Japanese morphological analysis.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 53% of the total text.

Extraction of Dependency Structures from a Huge Amount of Japanese
Texts

      Disclosed is a device to extract partial dependency structures
from results of Japanese morphological analysis.

OVERVIEW OF THE SYSTEM

An Overview of the algorithm is shown below.

Japanese morphological analysis is applied to input sentences in
a text database;
 for i = 2 to maximum Bunsetsu number in a text database;
   for j = 1 to total number of sentences in a text database;
     Estimate partial dependency structures for i bunsetsus from the
end
    of sentence;
  end;
  Add obtained partial dependency structures whose frequencies are
   greater then F to a partial dependency structure database;
  end;

An example of a result of Japanese morphological analysis is shown
below.
  "-" means a boundary of bunsetsus.  Words in parenthesis are
parts-of-speech.
  An example of a partial dependency structure is shown below.

ALGORITHM TO ESTIMATE PARTIAL DEPENDENCY STRUCTURES
  1.  Basis of Estimation
      "i bunsetsus" means i bunsetsus from the end of the sentence
(See
       below).
                    3 bunsetsus
                +---------+
                |       2 bunsetsus
                |    +----+
                |    |    1 bunsetsu
     * - .....  * -  * -  *
     A dependency structure for i bunsetsus is estimated from a
    partial dependency database.
           A partial structure for i bunsetsus is estimated only when
    it is estimated uniquely and without conflict.
           "Without conflict" means any partial dependency structures
    for i bunsetsus don't conflict each other.  "uniquely" means only
    one candidate is obtained as a result of estimation.  To estimate
    candidates uniquely, the anti-crossing constraint is applied.
  2.  Algorithm for Estimation
           When i is 2, all dependency structures are estimated
    uniquely and without conflict.   For i(>2), all i-1 partial
    dependency structures for i bunsetsu which are estimated uniquely
    and without conflict are...