Browse Prior Art Database

Text Enhancer for Italian Script

IP.com Disclosure Number: IPCOM000035886D
Original Publication Date: 1989-Aug-01
Included in the Prior Art Database: 2005-Jan-28
Document File: 3 page(s) / 23K

Publishing Venue

IBM

Related People

Maier, M: AUTHOR [+2]

Abstract

This article describes a text enhancer algorithm for the Italian typographical script, which uses a tree (retrieval) structure to represent the dictionary.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

Text Enhancer for Italian Script

This article describes a text enhancer algorithm for the Italian typographical script, which uses a tree (retrieval) structure to represent the dictionary.

The tree structure is supposed to be known, and portions of the dictionary representation are supposed to be available when needed.

Let A be a finite string of characters, A < i > be the i-th character of A, A < i:j > be the i-th through j-th characters (inclusive) of A, and A be the length of A, as the number of characters.

Given strings A and B, a set of edit operations transforming strings into strings and a set of weights for these operations, it is possible to find a sequence of edit operations such that the sum of the weights of the edit operations is minimal and transforms A into B.

The edit operations applied by the algorithm are: a) deletion of a character of A, b) insertion of a character of B, c) change of a character of A into a character of B, and d) interchange of two adjacent characters of A.

The following weights are associated to each edit operation: a)b(A < i > -> Q ) the weight of deleting the character A < i > from the string A, b) b(Q -> B < j >) the weight of inserting the character B < j >, c) b(A < i > -> B < j >) the weight of changing the character A < i > into the character B < j >, and d) b(A < i - l:i > - > A < i:i - l >) the weight of interchanging the two adjacent characters A < i - l > and A < i > of the string A, where Q denotes the null character.

These weights have been set in such a way as to simulate typographical errors, where the interchange, when possible, is very frequent (minimum weight), and all the other weights depend on the position of the characters on the keyboard. For example, on a QWERTY keyboard, the change of the character "s" with the character "a" or "d" has a weight less than the changes with "w", "e", "z" or "x", which, moreover, are less than the weight of any other change. The weights of deletion and insert...