Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Improved Endpoint Detector for Japanese Speech Recognition

IP.com Disclosure Number: IPCOM000107355D
Original Publication Date: 1992-Feb-01
Included in the Prior Art Database: 2005-Mar-21
Document File: 2 page(s) / 54K

Publishing Venue

IBM

Related People

Nishimura, M: AUTHOR

Abstract

This article describes an endpoint detector that uses features of spoken Japanese.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 87% of the total text.

Improved Endpoint Detector for Japanese Speech Recognition

       This article describes an endpoint detector that uses
features of spoken Japanese.

      Isolated word recognition is based on the premise that the
input signal consists of an utterance preceded and followed by long
silences.  The process of separating the utterance from the silences
is called endpoint detection. In isolated word recognition systems,
accurate and fast detection of endpoints is very important for
reliable recognition, but it has been difficult to achieve fast
detection in Japanese because of the problem of Japanese long
consonants, which are known as "Sokuon".  Some long consonants
consist of a stopped consonant and a preceding long silence.
Consequently, conventional Japanese endpoint detectors must wait much
longer than English ones after detecting the "ending" of an
utterance, in order to confirm that the following silence is not part
of a long consonant.

      Using the knowledge that a long consonant is preceded and
followed by strong peaks in the energy contour of a speech utterance,
we designed a new endpoint detector.  The energy E(t) of the input
signal is computed frame by frame. When E is continuously higher than
a threshold T1 for F frames, the energy peak is classified according
to the value of F.  Table 1 shows the criteria.  The energy class is
used to control the length of silence allowed inside an utterance T2,
as shown in Table 2.  The figure illustrates...