Browse Prior Art Database

# Method and System for Navigation of Audio Signal over Telephony Channel

IP.com Disclosure Number: IPCOM000201546D
Publication Date: 2010-Nov-15
Document File: 3 page(s) / 132K

## Publishing Venue

The IP.com Prior Art Database

## Abstract

A method and system for navigating through an audio signal over a telephony channel is disclosed. The method and system enable quick and efficient browsing of the audio signal for a user.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 3

Method and System for Navigation of Audio Signal over Telephony Channel

Disclosed is a method and system for providing navigation features over a telephony channel for browsing audio signals. The method involves processing navigation inputs provided by a user over the telephony channel to provide efficient browsing of the audio signal.

A flow diagram illustrating the disclosed method is shown in Fig. 1.

Figure 1

The method involves detecting spectral or energy variations in an audio signal provided over a telephony channel. The spectral variation may be detected by utilizing mathematical operations. For example, as given in equation 1, Mi is a mel-frequency cepstrum (MFCC) vector of frame i, and µi is the mean of MFCC vectors over N frames adjacent to ith frame. Further, ||.|| is the Euclidean norm. Using the equation 1, value of Ci is computed. In case, value of Ci is above a threshold, a significant spectral variation is observed. Computation of µi is further given in equation 2.

(1)

Along with spectral variation, variation in energy of the audio signal is also detected. This involves computation of differences in amplitudes in adjacent frames of the audio signal. Differences in values above a threshold imply significant energy variation. Based on the spectral and/or energy variations transient regions and steady regions in the audio signal are identified.

(2)

1

Page 02 of 3

Upon identification of the transient regions from the spectral/energy variation of the audio signal, non-uniform dropping of frames from the audio signal is performed. This involves, dropping of frames from the steady regions at a higher rate than frames from the transient regions. Also, silence regions in the audio signal are either completely dropped or shortened based on the target speed factor

Further, in the voiced regions, frame-rate is pitch-synchronous and frame-length is an integer multiple of the local pitch value. However, in unvoiced regions the frame-rate is constant and frame-length is an integer multiple of the frame rate. Thus, information about the frame rate is also identified to perform non-uniform dropping of the frames.

Thereafter, the method involves control...