Browse Prior Art Database

Correct Declination Forms of German Words

IP.com Disclosure Number: IPCOM000107558D
Original Publication Date: 1992-Mar-01
Included in the Prior Art Database: 2005-Mar-22
Document File: 4 page(s) / 124K

Publishing Venue

IBM

Related People

Bandara, U: AUTHOR [+3]

Abstract

A method is described for finding correct declination forms of German words in a text without explicit consideration of grammatical rules.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 51% of the total text.

Correct Declination Forms of German Words

       A method is described for finding correct declination
forms of German words in a text without explicit consideration of
grammatical rules.

      The problem of finding the correct declination form of a word
arises in many current natural, especially German, language
processing technologies.  With a speech recognizer, the decoder
output turns out a grammatically incorrect declination form of a word
because both the correct and the incorrect form have similar
acoustics and, additionally, the speaker's utterance has been
unclear.  Example:
      verbindlichen/verbindlichem
The speech recognition system then has to find the contextually
correct declination form of the word.

      With a proof-reader checking the spelling according to context,
the problem presents itself analogously.  The user expects the proof-
reader to check the declination form of words, because the incorrect
form may have been used carelessly, the grammatically correct form
was unknown or there has simply been a typing error.

      The declination process leads to an additional problem, namely
the volume of the vocabulary which tends to grow as a result of
declinations.  This is illustrated below by way of an example.

      The words verbindlich, verbindliche, verbindlichen,
verbindlichem, verbindlicher, verbindliches should be entered into
the vocabulary for the root word verbindlich.

      The described method determines the grammatically correct
declination forms without parsers or any other explicit use of arrays
of grammatical rules.  The hidden Markov model (HMM) technique is
used which may be regarded as a learning system.  This technique is
state of the art for signal processing.

      For simplicity, the method to which this article refers is
described with respect to finding correct declination forms of
adjectives.  For verbs and nouns, there is an analogous approach.
      Initial phase
      1.   Classification of German vocabulary according to the
scheme of the attached table.
      Training phase
      2.   Extraction of contexts for all adjectives, i.e., m
preceding and n following words with respect to the adjective.
           Der Bedarf an qualifizierten Mitarbeitern ist stark
gestiegen. context for the adjective 'qualifizierten' (m = 3; n = 1):
           Der Bedarf an qualifizierten Mitarbeitern
      3.   Conversion of a series of words in the context into a
series of symbols, with each class being designated by a unique
symbol, i.e., character.
           Der Bedarf an qualifizierten Mitarbeitern -> mcq3d
      4.   Definition of six HMMs, allocating each model to each
declination form.
      5.   Training the models using context symbols obtained from a
text corpus (e.g., newspaper texts).  Each series of context symbols
is used to train only the model allocated to that declination f...