Browse Prior Art Database

Hybrid speaker recognition passphrase

IP.com Disclosure Number: IPCOM000236887D
Publication Date: 2014-May-21
Document File: 3 page(s) / 53K

Publishing Venue

The IP.com Prior Art Database

Abstract

There are two types of spoken text dependent speaker recognition phrases: Fixed pass-phrase has the accuracy advantage (as in the enrollment phase the user is saying the exact phrase which he is going to use during the verification phase), while it is vulnerable to spoofing - if someone recorded you saying your pass-phrase, he can reuse it and impersonate as you. Prompted pass-phrases are much harder to spoof (e.g. the user is requested to say a 4 digit random number. generated by the system), but they are less accurate, as the system needs to extrapolate how you are going to say a specific number (e.g. from 0-9 training),without having the exact number phrase in the enrollment and development data.. The suggested "hybrid" pass-phrase includes two parts in the same single continuous phrase: The first which is a 'global pass-phrase' (or a aspeaker dependent pass-phrase) is fixed, while the second 'prompted pass-phrase' is varying and is prompted by the system. The global pass-phrase has the advantage of better accuracy, while the second pass-phrase is harder to spoof (liveness detection)

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 3

Hybrid speaker recognition passphrase

There are two types of spoken text dependent speaker recognition
phrases:


fixed pas-phrase


varying (prompted) pass-phrase

    Fixed pass-phrase has the accuracy advantage (as in the
enrollment phase the user is saying the exact phrase which he is
going to use during the verification phase), while it is
vulnerable to spoofing - if someone recorded you saying your
pass-phrase, he can reuse it and impersonate as you.

    Prompted pass-phrases are much harder to spoof (e.g. the
user is requested to say a 4 digit random number. generated by
the system), but they are less accurate, as the system needs to
extrapolate how you are going to say a specific number (e.g. from
0-9 training),without having the exact number phrase in the
enrollment and development data..

    In order to overcome those shortcoming, systems may uses a
combination of those 2 approaches: The user is requested to say
his fixed pass-phrase, and then he is prompted to say the varying
pass-phrase
This two stage process is cumbersome and adds more time to the
authentication process. In addition it attracts unwanted
attention. People are not comfortable saying an obvious
pass-phrase like: "my voice is my password" in a public place.
prior art:
WO 2012075641 A1: "Device and method for pass-phrase modeling for
speaker verification, and verification system"

US 8386263 B2: "Speaker verification methods and apparatus"

CA 2267954 C "Speaker verification method" mentions confidential
PIN as part of authentication

    The suggested "hybrid" pass-phrase includes two parts in the
same single continuous phrase: The first which is a 'global
pass-phrase' (or a a speaker dependent pass-phrase) is fixed,
while the second 'prompted pass-phrase' is varying and is
prompted by the system.

The global pass-phrase has the advantage of better accuracy,
while the second pass-phrase is harder to spoof (liveness
detection)

    Passing both parts in the same utterance improves the
usability of the system, because the total authentication time is
much shorter
In addition it is likely to attract less attention ("please order
me part number 1234") if we choose the pass-phrases with care.

    At the server, the utterance may be split using speech
recognition techniques (this is one embodiment, more embodiments
below), to the distinctive global and prompted parts which are
each fed to the respective speaker recognition engines


Page 02 of 3

The security engine generates the pass-phrase for the user to say
(by combining a fixed phrase with a dynamically selected one)
the user is prompted to say the pass-phrase
the user says the requested pass-phrase (utterance)
the user speech is fed to the splitter, which splits the data
according to the policy from the security engine (which engine
receives which part)
each engine receives its respective part of the speech
each engine produces a score, according to the speaker
recognition confidence
the security engine receives the engine scores and the splitting
score (e.g. the proba...