Browse Prior Art Database

Intensity Dependent Pitch Estimation in Speech Signals Disclosure Number: IPCOM000240599D
Publication Date: 2015-Feb-11
Document File: 2 page(s) / 36K

Publishing Venue

The Prior Art Database


An enhancement is proposed for methods of pitch contour tracking in speech signals. The disambiguation between instanteneous pitch candidates leading to double pitch errors is reduced based on the pitch-energy relationship. Instanteneous pitch candidates are re-scored depending on their frequency values and the local signal energy value. A pitch tracker produces more accurate pitch contour when using the modified scores.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 2

Intensity Dependent Pitch Estimation in Speech Signals

Temporal pitch contour is an important attribute of a speech signal widely used in speech modeling and synthesis, voice emotion classification, speech recognition for tonal languages and other applications. Pitch estimation problem has received much attention but still remains in the focus of the speech research community. Numerous works in this area are reported every year at the major speech processing conferences. Advances in speech modeling and synthesis escalate the demand to the accuracy and robustness of pitch estimators. State of the art methods do not yet provide accurate results on transient non-stationary speech segments such as voicing onsets and offsets.

Conceptually any pitch estimation process is comprised of two steps:1. Instantaneous pitch candidates generation. At this step the total speech signal duration is divided to short overlapping frames, e.g. 20 ms frames with 10 ms overlap. Then each frame is analyzed either in frequency or time domain, and multiple candidates of pitch frequency F0 are generated for each frame. A confidence score is associated with each candidate.2. Pitch contour tracking. A single candidate is selected for each frame based on a pitch contour continuity measure and an integral score of the selected candidates. The tracking procedure attempts to minimize or to limit the differences between the pitch values associated with neighboring frames while maximizing an integral measure of the selected candidate scores.

The pitch candidates and their scores are generated by analyzing the periodicity analysis of the speech frame represented by the harmonicity of its Fourier spectrum or time-domain autocorrelation function. Typically, integer multiples of the true pitch frequency are valid candidates having high scores. Sometimes a pitch period cycle consists of two half-cycles quite similar in their shapes. Hence the set of the frame pitch cand...