Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Mixed Video/Audio Pronouncing Dictionary

IP.com Disclosure Number: IPCOM000046748D
Original Publication Date: 1983-Aug-01
Included in the Prior Art Database: 2005-Feb-07
Document File: 3 page(s) / 43K

Publishing Venue

IBM

Related People

Cohen, PS: AUTHOR [+4]

Abstract

Application of video-disc technology yields an impressive improvement to the function and ease of use of an unabridged dictionary. The present article describes an audio capability to a video-disc-based dictionary. The current method of diacritical/phonetic markings now used by dictionaries to show pronunciations is difficult, at best, for the non-specialist to use. The major problems in providing an audio pronouncing capability are memory size and economical creation of the data. A typical dictionary might have 105 main entries with 3 x 105 pronunciations. Estimating 0.5 second per utterance, the dictionary must contain 1.5 x 105 seconds = 41 hours of speech. At good fidelity (56 kb/sec), this would require 1.4 x 109 bytes. Since current discs provide N 109 bytes, this is marginally feasible.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 3

Mixed Video/Audio Pronouncing Dictionary

Application of video-disc technology yields an impressive improvement to the function and ease of use of an unabridged dictionary. The present article describes an audio capability to a video-disc-based dictionary. The current method of diacritical/phonetic markings now used by dictionaries to show pronunciations is difficult, at best, for the non-specialist to use.

The major problems in providing an audio pronouncing capability are memory size and economical creation of the data. A typical dictionary might have 105 main entries with 3 x 105 pronunciations. Estimating
0.5 second per utterance, the dictionary must contain 1.5 x 105 seconds = 41 hours of speech. At good fidelity (56 kb/sec), this would require 1.4 x 109 bytes. Since current discs provide N 109 bytes, this is marginally feasible. State-of-the-art compression technology (e.g., adaptive linear predictive coding (LPC)) can achieve satisfactory results at 9.6 kb/sec. This translates to 4.8 kb/utterance, or 600 bytes/utterance, which means a total space requirement of 1.8 x 108 bytes, or about 20% of the disc for the audio data base. Creating, encoding and optimizing the 3 x 105 utterances is a non-trivial chore. Therefore, we propose an on-line computer system as shown in Fig. 1. This will allow prompting the enunciators automatically and real-time editing, so that the result of the reconstruction process can be reviewed. The enunciator can vary his tempo or stress pattern (subtly) to achieve satisfactory results. Using such a system to record, a throughput of one thousand words/ day seems quite reasonable. The entire dictionary could be r...