Browse Prior Art Database

Method for Decomposition of Compound Words

IP.com Disclosure Number: IPCOM000117286D
Original Publication Date: 1996-Jan-01
Included in the Prior Art Database: 2005-Mar-31
Document File: 2 page(s) / 68K

Publishing Venue

IBM

Related People

Laube, A: AUTHOR

Abstract

Disclosed is a method of pre-splitting compound words into potential components delimited by the boundaries of abstract syllables.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Method for Decomposition of Compound Words

      Disclosed is a method of pre-splitting compound words into
potential components delimited by the boundaries of abstract
syllables.

      Existing solutions for the pre-splitting of compounds into
potential components use the boundaries between the letters for the
delimitation and the identification of the potential components,
e.g., (1).  Further, in (2) a method for verifying spelling of
compound words is disclosed where the components of the compound
words are isolated by application of tree-scanning techniques.  The
components can be either an independent word or a prefix, a middle
element or a suffix of a word.

      The proposed method is based on the methodology of decomposing
into abstract syllables (3) from which potential components of
compounds are formed.  Hereby, potential boundaries of components are
determined.  The method is particulary based on the following
assumptions:
  1.  The length of compound words is greater or equal than 5;
  2.  each compound consists of at least two abstract syllables,
i.e.,
       each component consists of at least one abstract syllable;
  3.  the abstract syllables consist of at least two letters, with
the
       exception of the first and the last syllable of a compound
       (hereby the number of potential decompositions is
substantially
       reduced);
  4.  the sequence of consonants at the beginning and the end of a
       compound is not examined.
      o  These assumptions are valid especially for German compounds.
      o  For applying the method for other languages it can be
          possible to modify them.

      For an efficient search, the potential components are placed
into a scanning-tree wherein...