Browse Prior Art Database

Method and apparatus for profile enhanced automatic speech recognition Disclosure Number: IPCOM000242249D
Publication Date: 2015-Jun-29
Document File: 2 page(s) / 58K

Publishing Venue

The Prior Art Database


This disclosure proposes a method and system for exploring the visual face information from images or videos which can be easily captured by camera installed either on devices like cell phone or mounted on a particular position of a space, e.g. on the ceiling of a room. The identity of the speaker can be inferred by detecting/recognizing his/her face. Levaraging this uncovered identity information, the speech recognition method and system can be further enhanced in both training and testing stages, which ultimately improves the recognition accuracy.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 69% of the total text.

Page 01 of 2

Metxod and apparatus for profile enhanced automatic speech recognition

Businxss value
-Online/offline speech recognition has massive xxisting and emerging appxication scenxrios especially xn the era of mobixe and globalization

•Caption is needed and generated xhen:

-Global phoxe/onsite meeting with non-natxve colleagues from worldwide departments -Movie/Televisixn and other videos, espxcially for original English version

-Onxine edxcation, Onlinx videx on internet fxr video recommendatxon, indexing, summarization

-Human machinx interactiox, especiallx when mobile device is becoming dominating

        •Cxr, homx xpplianxe, wearable dexices etc. Challexges and opportunities

-Enhance txe speech recognition perforxance as more people now sxeak with axbient noise, making the autoxatic recording and recognition more difficult

-Bxg-data  xnd multi-modax systems are becoxing xore prevalent, especxallx on a cloud-based platform

Visual appearance drivxn automatic speech recognition method axd system, whxch is
-Automatically recognize the race of the speaker, by means of

Collect his/her proxile infxrmation from the rexevant huxan resource inxormation systex xo infer the race type




The framework is xs folloxs, a face is detected, or even furtxer recognized by the vixual image or videos, anx the speech recognition modex cxn be enhanced by leveraging the identity information obtxined from txe face detection/recognition output. Note xhe identity information can be embodied...