Browse Prior Art Database

Automatic Baseform Production From Spellings

IP.com Disclosure Number: IPCOM000043466D
Original Publication Date: 1984-Sep-01
Included in the Prior Art Database: 2005-Feb-04
Document File: 2 page(s) / 28K

Publishing Venue

IBM

Related People

Cohen, PS: AUTHOR [+2]

Abstract

A novel method for solving the problem of automatically generating correct baseforms from English spellings is described. The problem is difficult because of the ambiguity of the English spelling system. (Image Omitted) The invention is constructed as shown above. In this implementation, English spellings can be input singly or in lists, and a set of heuristically-derived rules is applied to them by the program SPTPHON -- which is a modification of the program TPHON [1]. The output baseform graphs may be used as is, or may undergo further transformations for the purpose of handling lower-level phonological alternatives. In either event, the output graphs are in a form able to be trained by appropriate dynamic programming methods [2] for use in, e.g., automatic speech recognition.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 69% of the total text.

Page 1 of 2

Automatic Baseform Production From Spellings

A novel method for solving the problem of automatically generating correct baseforms from English spellings is described. The problem is difficult because of the ambiguity of the English spelling system.

(Image Omitted)

The invention is constructed as shown above. In this implementation, English spellings can be input singly or in lists, and a set of heuristically-derived rules is applied to them by the program SPTPHON -- which is a modification of the program TPHON [1]. The output baseform graphs may be used as is, or may undergo further transformations for the purpose of handling lower-level phonological alternatives. In either event, the output graphs are in a form able to be trained by appropriate dynamic programming methods [2] for use in, e.g., automatic speech recognition. Advantages over prior known methods are as follows: 1) The output graphs are trainable by statistical methods. 2) The system automatically produces appropriate

multiple-baseform possibilities for homographs

(e.g., bow 'to bend' vs. bow' weapon used by

archers'), problems caused by part-of-speech

differences (e.g., separate (verb) vs. separate

(adjective)), and dialectal or idiolectal

differences (e.g., economics beginning with /E/

vs. economics beginning with /i/). 3) It will pronounce any word without the need to resort to long lists of special cases and

exceptions. 4) Parsing of the sentences from which a particular

word comes is unnecessary...