Browse Prior Art Database

Dynamic Modification of the Vocabulary of a Speech Recognition Machine

IP.com Disclosure Number: IPCOM000044409D
Original Publication Date: 1984-Dec-01
Included in the Prior Art Database: 2005-Feb-05
Document File: 2 page(s) / 14K

Publishing Venue

IBM

Related People

Bakis, R: AUTHOR [+2]

Abstract

Adapting the vocabulary of a continuous speech recognition system dynamically to varying topics provides in realtime an effective vocabulary that is an order of magnitude larger than actual system capability. Current technology limits the vocabulary of a speech recognition machine to a few thousand words. This is not sufficient for many office dictation tasks. Although the recognition machine can allow the user to dictate words which are not in the vocabulary by spelling them, users often find this inconvenient. Starting with a large vocabulary, perhaps of the order of 50,000 or 100,000 words, create a number of small "clusters" of maybe 500 or 1000 words each. These clusters may well overlap. The clusters could be constructed manually, by assigning words dealing with closely related topics to a cluster.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 2

Dynamic Modification of the Vocabulary of a Speech Recognition Machine

Adapting the vocabulary of a continuous speech recognition system dynamically to varying topics provides in realtime an effective vocabulary that is an order of magnitude larger than actual system capability. Current technology limits the vocabulary of a speech recognition machine to a few thousand words. This is not sufficient for many office dictation tasks. Although the recognition machine can allow the user to dictate words which are not in the vocabulary by spelling them, users often find this inconvenient. Starting with a large vocabulary, perhaps of the order of 50,000 or 100,000 words, create a number of small "clusters" of maybe 500 or 1000 words each. These clusters may well overlap. The clusters could be constructed manually, by assigning words dealing with closely related topics to a cluster. For example, Book, Pen and Paper could also be in a cluster dealing with publishing. At the same time Pen and Paper could also be in a cluster dealing with graphic arts. On the other hand, words could also be clustered with the aid of statistical methods. For example, the frequency with which two different words occur in the same sentence or paragraph is used as a measure of the "distance" between these words. (Actually, a monotonically decreasing function of this frequency would have to be the distance.) Given a collection of such clusters, the vocabulary of the recognizer at any instant is the union of some of these clusters. The recognition machine dynamically alters the vocabulary by dropping some clusters and bringing in new ones. For example, the machine records the frequency with which each cluster is used. When necessary to make room for a new cluster, the machine drops the least frequently or least recently used cluster. To decide when a particular new cluster should be brought in, the machine records the frequency of use even for those clusters that are not part of its active vocabulary at the moment. Some words in those inactive clusters might be duplicated in the active clusters, and might thus be used in the text, or the user might spell such words. When the frequency count for one of the inactive clusters gets sufficiently high, for example, higher than that of some active cluster, the inactive cluster is activated, that is, brought in as part of the active vocabulary. If the user spells a word because it was not in any of the currently active...