Browse Prior Art Database

Simulated Spectrogram for Speech Synthesis

IP.com Disclosure Number: IPCOM000038839D
Original Publication Date: 1987-Mar-01
Included in the Prior Art Database: 2005-Feb-01
Document File: 3 page(s) / 92K

Publishing Venue

IBM

Related People

Dixon, NR: AUTHOR [+3]

Abstract

This invention involves a speech synthesizer which represents synthesizer control functions as a control vector spectrogram. Control data which guides the synthesizer is plotted in a format that is easily interpreted by speech scientists. In accordance with the invention, each utterance of speech is segmented into diphones. Each diphone is composed of several control vectors changing through time. The control vectors serve as inputs from a computer to a speech synthesizer. (Image Omitted) In Fig. 1, the utterance "seventeen" is shown segmented along the abscissa to indicate nine diphones --i.e., XXSX, 04SXEH,..., XXNX. Diphones are discussed in the prior art [*]. Along the ordinate of the graph of Fig. 1 are ten control vectors AN through CO, the values of which are shown to vary over time (in the horizontal direction).

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

Simulated Spectrogram for Speech Synthesis

This invention involves a speech synthesizer which represents synthesizer control functions as a control vector spectrogram.

Control data which guides the synthesizer is plotted in a format that is easily interpreted by speech scientists. In accordance with the invention, each utterance of speech is segmented into diphones. Each diphone is composed of several control vectors changing through time.

The control vectors serve as inputs from a computer to a speech synthesizer.

(Image Omitted)

In Fig. 1, the utterance "seventeen" is shown segmented along the abscissa to indicate nine diphones --i.e., XXSX, 04SXEH,..., XXNX. Diphones are discussed in the prior art [*]. Along the ordinate of the graph of Fig. 1 are ten control vectors AN through CO, the values of which are shown to vary over time (in the horizontal direction). The control vectors relate to recognized spectral features: AN corresponds to nasal amplitude; F1, F2, F3, F4 to frequencies of formants 1,2,3 and 4; A0 to voice amplitude; AH to hiss amplitude; FH to fricative frequency F0 to fundamental frequency; and C0 to binary data controlling aspiration/frication (fric), formant bandwidths (f1,f2,f3,f4), and hiss modulation (hm). To make the speech synthesis information more readily understandable to a human analyst, a control vector spectrogram is provided according to the invention. Fig. 2 shows a control vector spectrogram for the utterance "seventeen". A relative amplitude trace, which is a weighted sum of AH, A0, and AN (the amplitude of hiss, voicing, and nasalization excitation, respectively), is plotted at the bottom of the spectrogram. The ordinate is frequency in kHz. Each voicing (and/or nasal) control vector point is represented by a 'triangle'. The amplitude parameters AN and A0 affect the height and width of the individual triangles which make up the traces for...