Browse Prior Art Database

Method for Placing Accents on Letters in Poorly Edited Text

IP.com Disclosure Number: IPCOM000111390D
Original Publication Date: 1994-Feb-01
Included in the Prior Art Database: 2005-Mar-26
Document File: 2 page(s) / 31K

Publishing Venue

IBM

Related People

Brown, PF: AUTHOR [+4]

Abstract

Disclosed is a method for correcting the accents that appear on letters in some words in some foreign languages. Let f be a sequence of letters, some of which are accented, making up a collection of text. The placement of accents in f is corrected according to the following procedure.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 100% of the total text.

Method for Placing Accents on Letters in Poorly Edited Text

      Disclosed is a method for correcting the accents that appear on
letters in some words in some foreign languages.  Let f be a sequence
of letters, some of which are accented, making up a collection of
text.  The placement of accents in f is corrected according to the
following procedure.

1.  Assemble a vocabulary of letter sequences known to be properly
    accented.  This can be done, for example, from a dictionary.
    Some letter sequences may have one or more acceptable accent
    sequences.

2.  Analyze f into a sequence of letter sequences together with
    accents so as to produce a sequence of words.

3.  For any letter sequence in f known to have only a single accent
    sequence, set the accent sequence for that letter sequence to the
    known sequence.

4.  Correct the remainder of the accent sequences by treating them as
    potential misspellings, applying the method of Brown et al [*].
    In this process, it is assumed that letters may only have accents
    removed, added, or changed.

      This procedure has been used successfully to correct the
accents on French text input to the Candide machine translation
system.

Reference

[*]  "Spelling Correction with Keyboard, User, and Language Models,"
   IBM Technical Disclosure Bulletin, 36, 4, 385-390 (April 1993).
   __________________________________