Browse Prior Art Database

Method and System for Automated Phone Conversation Transcription Service using Handset Agents Disclosure Number: IPCOM000198387D
Publication Date: 2010-Aug-06
Document File: 2 page(s) / 28K

Publishing Venue

The Prior Art Database


Disclosed is the description of a system that transcribes phone conversations among several parties. The system uses handset agents present on mobile smart phones.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Method and System for Automated Phone Conversation Transcription Service using Handset Agents

Current speech dictation software can transcribe a single user's audio input with an accuracy above 95%. This is possible after the user provides vocabulary metadata for instance,


on the market scan the user's emails and a 20-minute training process is undergone which creates a profile for the user that has tuned acoustic and language models.

The accuracy of speaker-independent transcription varies depending on the language style of the user and the subject matter. For the vocabulary of a teenager with a common language style, accuracy of 90% is common. This is why voicemail transcription is becoming popular. However, for transcription for adults in a business setting, accuracy of 60% is more common. This is because most work settings have an abundance of acronyms and adults in a business setting have more complex vocabularies. Speech patterns of adults in a business setting are often unique as well.

This publication proposes a system that goes much further. It describes a system for phone conversation transcription which currently in the art with a generic speech engine, transcripts are at the bound of about 40% accuracy for calls with more than two speakers. The benefits of automated phone transcription are very apparent when the potential for search are considered. If all phone conversation for an organization are transcribed and then published in a searchable fashion, a manager can determine what is being discussed and decided on a daily basis by just using a conventional website search bar. If they want to listen to any subset of a conversation, they can easily determine the most interesting portion by looking at the transcript (even if not completely accurate) and playing it back.

Each user has a software agent that captures the microphone input of the device they are using to talk on the conference call with. The...