Browse Prior Art Database

Method to Cluster Rich ASR output of the Speech Data

IP.com Disclosure Number: IPCOM000226864D
Publication Date: 2013-Apr-23
Document File: 4 page(s) / 179K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a system and method to cluster the speech data using rich ASR output. ASR transcription is errorneous hence cluster purity drops while moving from manual transcripts to 1-best ASR transcripts. However, potential presence of correct transcripts in alternative hypotheses as represented by richer ASR outputs such as word confusion networks (WCN) and lattices gives scope for improving the overall clustering performance. Additional information from ASR output, such as word confidence score, can also be used to improve the overall clustering performance. Generate the document vectors (to be used during clustering) using richer ASR output such as word confusion networks and lattices to improve the clustering performance.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 4

Method to Cluster Rich ASR output of the Speech Data

One of the basic building blocks to extract insights from speech data is speech data clustering. In speech data clustering, speech is first transcribed to further apply clustering methods on the transcribed text. However, automatic transcription of speech is typically erroneous, with typical WER for spontaneous telephone speech in the range 20-30%. Noisy transcriptions in general lead to poor clustering accuracy. However 1-best output of ASR system represents only the most

probable hypothesis obtained during search over set of all possible hypotheses. On the other hand, looking into alternative hypotheses, as represented in rich ASR output such as n-best output, word confusion network and word lattices, potentially could lead to an improved clustering accuracy. This is because of the potential presence of correct transcriptions in the alternative hypotheses.

Problem:
disclosed herein is about utilizing rich ASR outputs such as n-best output, word confusion network (WCN) and word lattice to improve the speech data clustering accuracy.

Background:

Method:

1


Page 02 of 4

The method proposed is described in the following block diagram.

Experimental validation:

2


Page 03 of 4

Features consists of the following steps:

ยท Speech documents are transcribed by the ASR system to generate rich ASR outputs apart from 1-best hypotheses such as confidence scores, alternative hypotheses (in the form of n-best hypotheses, word...