Browse Prior Art Database

Improving the Fundamental Frequency Contour in Speech Synthesis

IP.com Disclosure Number: IPCOM000091062D
Original Publication Date: 1969-Oct-01
Included in the Prior Art Database: 2005-Mar-05
Document File: 3 page(s) / 28K

Publishing Venue

IBM

Related People

Bakis, R: AUTHOR

Abstract

This arrangement is for improving the fundamental frequency contour in reproductive speech synthesis based on separate segments of natural speech that are previously stored. Reproduction at a speed different from the original recording speed causes a certain degree of unnaturalness. This is due to the fact that changes in reproduction speed entail variations in the rate of change of the fundamental frequency. A library of speech signals representative of speech segments is initially recorded in a storage device. Those speech signals necessary for synthesizing a certain message are taken from the storage device. They are first compressed or expanded in time, respectively, and then passed through a modifier.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 3

Improving the Fundamental Frequency Contour in Speech Synthesis

This arrangement is for improving the fundamental frequency contour in reproductive speech synthesis based on separate segments of natural speech that are previously stored. Reproduction at a speed different from the original recording speed causes a certain degree of unnaturalness. This is due to the fact that changes in reproduction speed entail variations in the rate of change of the fundamental frequency. A library of speech signals representative of speech segments is initially recorded in a storage device. Those speech signals necessary for synthesizing a certain message are taken from the storage device. They are first compressed or expanded in time, respectively, and then passed through a modifier. The transfer function of this is controllable by signals depending on the difference between the recording and reproduction speeds of the speech segments being assembled to form the desired message. The transfer function of the modifier is controlled such that the rapid fluctuations of the fundamental frequency of the respective reproduced speech segments are affected to a larger extent than the slow fluctuations. In case the reproduction of a certain speech segment is at higher speed than its recording, the amplitudes of the signals with higher frequencies are attenuated. In case of slower reproduction, they are accentuated.

In drawing A, storage device 1 contains the speech segments which can consist of complete words. As messages are to be assembled, storage 1 supplies appropriate speech signals over communication channels 2 to synthesizer 3. Storage 1 can be any of various random access storage devices such as magnetic tapes or disks or a photographic film placed on the face of a cathode ray tube. Synthesizer 3 can be of the channel or formant type. In addition to the speech signals, storage 1 also supplies a voltage, proportional to the fundamental frequency of the stored speech segment, on output line 4. This signal is not directly transmitted to synthesizer 3 but is modified in modifier 5 under the control of a further signal from storage 1 received over control line 6. This signal is representative of the speed at which the particular speech segment is to be reproduced. Modifier 5, under the control of a further signal from storage 1 on line 6, representative of the speed at which the particular speech segment is to be reproduced, modifies the fundamental frequency contour in accordance with the above method.

A modifier which meets these requirements is shown in drawing B. Operational amplifier 16 is connected to an input network consisting of resistor 7 and capacitor 8 arranged in a parallel circuit. In a first approximation, amplifier 16 is assumed to have an infinite gain. Thus, for a finite output...