Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Transcription of speech data with minimal manual effort

IP.com Disclosure Number: IPCOM000028925D
Original Publication Date: 2004-Jun-08
Included in the Prior Art Database: 2004-Jun-08
Document File: 7 page(s) / 41K

Publishing Venue

IBM

Abstract

Disclosed is a method to incrementally adapt the speech recognition system based on manual transcription of the speech input to reduce the amount of manual transcription required, comprising the steps of, (a) identifying sub-segments in a speech corpus, (b) adapting the speech recognition system based on the entire set of transcribed sub-segments, (c) for each sub-segment belonging to the set of yet to be transcribed sub-segments of a speech corpus, applying a speech recognition system, determining the confidence that a speech recognition system has on each sub-segments, and identifying the sub-segments on which speech recognition system has a confidence above a threshold and moving them to the set of transcribed sub-segments, (d) repeating steps (b) to (c) until a termination condition is reached, (e) selecting sub-segment(s) of the speech corpus with least confidence for manual transcription. (f) manual transcription of the identified sub-segment(s) from the speech corpus, and (g) repeating steps (b) to (f) until a termination condition is reached. The method combines unsupervised learning and supervised learning to improve the productivity of manual transcription. At the same time, active learning or dynamic selection of speech sub-segment to be manually transcribed further improves the productivity.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 19% of the total text.

Page 1 of 7

Transcription of speech data with minimal manual effort

1. Introduction

In a call center, an agent takes call from number of customers and responds to customer queries, complaints or suggestions. As part of customer service, companies often require their customer service representatives (CSR) to record notes of the call in the call logs. At the same time, all calls are recorded for handling any legal risks. The recorded calls be used along with speech recognition and the transcription of the call can be used for analysis from a customer service view point. In the conversations, the voice of the CSR is part of number of calls. It pays to build a speaker adaptation module for each CSR.

Similarly, medical transcription is an important business problem for doctors. For legal reasons, doctors are required to keep records to their medical advice and prescriptions to patients and hence they have to create documents for the purpose of keeping these records. Speech recognition is one way of medical transcription for such voice records of medical advice that a doctor prescribes.

Medical transcription is also out-sourced these days, with an external party listening to doctor's voice and takes notes from the voice and sending the notes to the client for records. Often, the records sent by a doctor can be identified as being sent by the same doctor. It pays to build a speaker adaptation module and as separate language model (because of the medical specialization of the doctors are different) for each doctor.

Since manual transcription is expensive, several techniques have been tried using speech recognition technologies.

2. Background and prior art

There are two different methods which are currently deployed. One method is to use manual transcription for all the speech data. The other method is to use speech recognition on the speech data and let manual transcription correct the errors.

While speech recognition has been tried for both the applications, there are inherent problems. For example,
1. There is a language model associated with each doctor's domain or CSR's process domain (credit cards recovery, IT helpdesk or telecom service departments have different language models). There is a requirement to create a language model online.
2. Speech recognition for random customers (speaker independent) may not be very good. Incremental speaker adaptation can further improve the result.

There have been several approaches proposed in the literature to adapt the acoustic and language models of a speech recognition system. In [1], the initial recognized text is sent to a search engine to retrieve relevant text data (of the similar domain) from the internet and then the language model is rebuilt using the new data. The new language model is used either to recognize new speech sentences or to re-score the previous speech sentences. In [2], the accuracy of spontaneous speech recognition is found to increase with the unsupervised adaptation the language...