Browse Prior Art Database

Similarity Calculator of Graphs

IP.com Disclosure Number: IPCOM000122711D
Original Publication Date: 1991-Dec-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 5 page(s) / 148K

Publishing Venue

IBM

Related People

Watanabe, H: AUTHOR

Abstract

This article describes a method for calculating the similarity between two graphs, one of the underlying technologies needed for case-based natural language processing (CBNLP). CBNLP is a new approach in the area of natural language processing, which makes decisions on the basis of a large amount of corpus (or number of examples) at appropriate points in the natural language processing. The crucial question in CBNLP is how to calculate the similarity between the given situation and the corpus. This report suggests one way of answering the question.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 37% of the total text.

Similarity Calculator of Graphs

      This article describes a method for calculating the
similarity between two graphs, one of the underlying technologies
needed for case-based natural language processing (CBNLP).  CBNLP is
a new approach in the area of natural language processing, which
makes decisions on the basis of a large amount of corpus (or number
of examples) at appropriate points in the natural language
processing. The crucial question in CBNLP is how to calculate the
similarity between the given situation and the corpus. This report
suggests one way of answering the question.

      The characteristics of the proposed method are: o It does not
depend on a particular semantic hierarchy. o It reflects not only
semantic similarity but also structural similarity. o It reflects all
features, not only semantic features.

      Data Structure: A graph subjected to this method is assumed to
have only one root and no cyclic path. The data structure of a graph
G is given as follows:
  G = <Ns,As>
  where Ns = (N*),
         As = (A*),
         N = (id (lex pos f*)) ... node,
         A = (label source target) ... arc,
         lex = lex: <lexical-form>,
         pos = pos: <part-of-speech>,
         f = <feature-name>: <feature-value>, and
         source, target = id.

      Similarity between Two Graphs: The similarity S between two
graphs is given by the following equation:
  S(Gs,Gt) = 1/D = Ns/SDg
  where  D is the distance between Gs and Gt,
          Ns is the number of nodes in Gs, and
          SDg is the simple distance between Gs and Gt.

      The algorithm for calculating the simple distance of two graphs
is shown in Figure 1.  Briefly, the simple distance is the sum of the
node distance between the two root nodes and the sum of descendant
node distances divided by the number of descendant nodes. In the
calculation of the distance between descendant nodes, there may be
more than one descendant node in the target graph that is related
with the same label arc by the focus source descendant node.
Therefore, in this case, the minimum distance is taken.
  Proc SDg(Gs,Ns,Gt,Nt) {
   Dg = Dn(Ns,Nt)
     while (Rs = unmarkedRelation(Ns) and Nsc=target(Ns,Rs)){
      RT = unmarkedRelations(Nt,label(Rs))
      num = numberof(RT)
      case (num) {
        when 0 then
         Dg = Dg+(numOfNodes(Nsc))*P  ;; P is a penalty
        when 1 then
         Rt = RT[0]
         Ntc = target(Nt,Rt)
         Dg = Dg + SDg(Gs,Nsc,Gt,Ntc)
         mark(R)
        Otherwise
         min = MAXOFDISTANCE ;; maximum distance
         R = NULL
         for each Rt in RT {
         Rtc = target(Gt,Rt)
         if ((d=SDg(Gs,Nsc,Gt,Ntc))<min) then
           min = d
           R = Rt
         endif
     ...