Browse Prior Art Database

Enhancing information retrieval with speech rate analysis.

IP.com Disclosure Number: IPCOM000249989D
Publication Date: 2017-May-13
Document File: 4 page(s) / 121K

Publishing Venue

The IP.com Prior Art Database

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 38% of the total text.

1

ENHANCING INFORMATION RETRIEVAL WITH SPEECH RATE ANALYSIS

ABSTRACT

Disclosed are a method and system that utilize speech-to-text technology to enhance an

information retrieval process involving documents or speech transcripts that are produced by

speech-to-text technologies.

The accuracy of an information retrieval application depends on the content and

informativeness of the training documents included in a corpus. Linguistically informative

documents provide a better foundation for training language models and support for more

robust queries.

The novel contribution to knowledge is a method and system that utilize speech-to-text

technology to enhance an information retrieval process involving documents or speech

transcripts that are produced by speech-to-text technologies.

The proposed method is based on multiple studies related to speech rate. A fast speech rate

correlates with lower average information content. Fast speakers use more words and have

fewer instances of passive voice than slower speakers do. This shows a global relationship

between speech rate and linguistic information as well as evidence for information channel

restriction on the choice of words and structures.

The proposed method utilizes voice- to-text technology to produce speech-to-text transcripts.

The method learns:

 Correlations between the speech rate and the linguistic informativeness of the speech content

 Speech rate information scores based on different levels of the linguistic informativeness of the speech transcripts

 Speech rate information scores for each speech rate category (e.g., slow, normal, fast)

The proposed method identifies more linguistically informative speech transcripts, evaluates

the speech rate of the transcripts, and assigns a corresponding speech rate information score.

Transcripts with a higher speech rate information score are more linguistically informative. This

supports robust queries, enhances the information retrieval process for the speech transcripts,

and enriches language models.

The systematic implementation of the method comprises two stages: the System Learning stage

and the System Runtime stage.

2

System Learning

1. System learns the speech rates baselines A. System establishes the baselines as normal, slow, and fast speech rates B. System learns the threshold values and ratios between acceptable-for-analysis

speech rates C. If the average speech rate is Sr, then the threshold T(Sr) is a function of the

average speech rate D. The system learns the threshold function T(Sr) so that the reasonable fast or

slow speech rate falls within the interval {Sr - T(Sr), Sr+T(Sr)} E. For further transcripts evaluations, the system selects transcripts that

correspond to the speech rate within the thresholds. 2. System learns the correlation between the speech rate and linguistic information of the

speech content A. System learns the boundary values for fuzzy speech categories (i.e., fast, normal,

slow) so that each category is distinguis...