Browse Prior Art Database

Algorithms for Mapping Word Senses Between Pairs of Dictionaries

IP.com Disclosure Number: IPCOM000102437D
Original Publication Date: 1990-Nov-01
Included in the Prior Art Database: 2005-Mar-17
Document File: 5 page(s) / 186K

Publishing Venue

IBM

Related People

Chodorow, MS: AUTHOR [+4]

Abstract

We consider dictionaries which are organized by head word, where each head word entry (W) includes a list (W1, W2,...) of senses for that head word, and where each sense (Wi) includes a list of words associated with that sense. The words in the list may be synonyms, for example, or translations of the head word into some foreign language.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 32% of the total text.

Algorithms for Mapping Word Senses Between Pairs of Dictionaries

       We consider dictionaries which are organized by head
word, where each head word entry (W) includes a list (W1, W2,...) of
senses for that head word, and where each sense (Wi) includes a list
of words associated with that sense.  The words in the list may be
synonyms, for example, or translations of the head word into some
foreign language.

      Given two such dictionaries, we consider the case where the
same head word (W=X) appears in both dictionaries.  The following
algorithms associate sense Wi in dictionary 1 with sense Xj in
dictionary 2, such that the associated (paired) senses (Wi,Xj) are
likely to be semantically related, that is, to refer to the same
intended meaning of the given head word.

      These algorithms may be useful for augmenting word lists and
related information (such as definitions, examples, and usage) from
one dictionary with the corresponding word lists and related
information from other dictionaries.  Potential application areas
include document search and retrieval, text understanding, word sense
disambiguation, machine-assisted translation, and computational
lexicography.

      The essence of these algorithms is to count words in the
pairwise intersection of the word lists for all senses (Wi,Xj) and to
decide appropriate means of identifying closely related senses.
Those senses with a sufficiently large number of words in common are
likely to be semantically related to each other.  This method is
neutral with respect to the exact nature of the semantic relation
among senses.  It simply indicates possible associations (or lack of
association), and leaves the nature of that association open to
further determination by the investigator.

      Similarity Measure
1.   Calculate C = the set of all words which are in both
dictionaries.  C is called the "common vocabulary" of the two
dictionaries.
2.   For every pair (Wi,Xj):
      a.   Calculate U(i,j) = the set of all words w in C such that
either w is in the word list of Wi, or w is in the word
list of Xj, or both.  U is the union of the word lists of Wi
and Xj, intersected with the common vocabulary C.
      b.   Calculate Sd(ij) = the set of all words w in C such that
either w is in the word list of Wi, or w is in the word
list of Xj, but NOT both.  Sd is the symmetric difference (union
minus intersection) of the word lists of Wi and Xj.
      c.   Calculate M(i,j) = the number of words in Sd(i,j) divided
by the number of words U(i,j).  M(i,j) is the measure of
dissimilarity between Wi and Xj. (NOTE:  If the number of words in
U(i,j) = 0, then we define M(i,j) = 1 in order to avoid formal
division by zero.)

      Discussion:  The measure M as defined above is one of several
standard measures which can be used to compare lists of words.  Any
such standard measure may be used in place of M.  M has the property
(among others) tha...