A Pre-Processing Method of Speech signals for Speaker Recognition and Indexing Disclosure Number: IPCOM000126189D
Original Publication Date: 2005-Jul-06
Included in the Prior Art Database: 2005-Jul-06
Disclosed is a method to pre-process speech signals for future speaker recognition processing without knowing the details of the processing that will be required.

In order to perform speaker recognition, both an operation needs to be defined (enrollment/scoring) and the identity claim(s) of the speaker(s) need to be known. However, many applications may not have this information at the time when the speech utterances to be processed are collected. For example, a system using speaker recognition for indexing purposes does not know in advance which queries will be run on its audio repository, but needs to be prepared to run some sort of speaker recognition processing on its utterances in the future. Another example is a conversational system that needs to authenticate a user with a single turn. Such a system will determine the user's identity claim from the speech in the first turn, knowing that subsequently it would need to score that same first turn against the model requested by the user.

A need therefore exists for preprocessing speech utterances in a way that will make future speaker recognition request process more efficiently regardless of the identity claim and the operation that needs to be performed (i.e. enrollment/scoring).

Preprocessing is particularly useful in applications that require the same utterance to be processed multiple times by a speaker recognition engine (for different operations and different identity claims). A preprocessing stage as suggested above may be extremely useful, since it allows for most of the processing to be performed only once (the preprocessing stage), allowing to process the utterance multiple times at a minimal additional computational cost.

The proposed invention suggests a preprocessing stage that performs the vast majority of the required computation related to speaker recognition in advance , independent of both the operation type and identity claims. The result of the suggested preprocessing method may be stored in memory for future online processing (e.g. MRCP v2 verification buffer), or stored permanently on disk for future query (e.g. speaker indexing).

The invention works within a Gaussian Mixture Model (GMM) modeling framework for speaker recognition, which is the prevalent algorithmic framework in state of the art speaker recognition systems. In a classical GMM system features are extracted from raw speech frames, and each resulting feature vector is subsequently processed for either scoring (calculating the GMM score over all Gaussians) or enrollment (collecting sufficient statistics over all Gaussians for the purpose of Bayesian adaptation from a Universal Backgr...