Method for improving the voice recognition rate by using additional input devices
Original Publication Date: 2004-Aug-27
Included in the Prior Art Database: 2004-Aug-27
Disclosed is a system that enables an improvement of speech recognition and error rates compared to prior art systems using additional input devices.
Method for improving the voice recognition rate by using additional input devices Method for improving the voice recognition rate by using additional input devicesMethod for improving the voice recognition rate by using additional input devices Method for improving the voice recognition rate by using additional input devices
Disclosed is a system that enables an improvement of speech recognition and error rates compared to prior art systems:
Whenever communication from humans to machines is needed, input channels are used. However, each type of input channel has its individual deficiencies: for example, the keyboard invites to keyboard-specific errors, where as e.g. pure audio based voice recognition systems tend to introduce errors resulting from segmentation errors (the correct separation of words and / or phonemes), context errors or from classification errors.
State of the art solutions work with one input channel at a time . They allow the user to post process the recognition result using an additional input channel in order to fix errors or to resolve ambiguities. This post processing requires additional time from the same or a different human user . The post processing may take advantage of advanced plausibility checks (spell checking using context and keyboard layout information for written text; surrounding words and knowledge about acoustical proximity in spoken text) as well as advanced user interfaces for making the post processing as comfortable as possible (e.g. using a keyboard / mouse to select from different choices from a ranked list of alternative recognition results ).
With recent advances in computing power it is now feasible to increase the recognition rate and decrease the error rate by adding several input channels . Post processing work can therefore be reduced significantly (increased recognition rate). The recognition error rate will also be reduced significantly.
- Speech recognition in noisy environments (in cars, using cell phones) - Speech recognition with high performance needs - Speech recognition with the need for a type of speaker 's mood or behavior detection
In the typical embodiment, a primary input channel C1 and a secondary input channel C2 are being used:
The primary input channel C1 (e.g. a microphone) is used to capture the audio part of the spoken word using prior art voice recognition systems at e .g. a phoneme level. A basic segmentation and classification is performed . The results R1 (including alternatives results) are logged...