Browse Prior Art Database

IMPROVING THE PERCEPTION OF NON-SPEECH INFORMATION IN VIDEO CONTENT USING VISUAL REPRESENTATION

IP.com Disclosure Number: IPCOM000238670D
Publication Date: 2014-Sep-10
Document File: 7 page(s) / 65K

Publishing Venue

The IP.com Prior Art Database

Related People

Venkadesan Marimuthu: AUTHOR

Abstract

Techniques are presented herein to render audio visualization spectrum as a picture-in-picture (PIP) region (PIP mode) or beneath the captioning region (Inline mode) by dynamically generating the spectrum during real-time playback with different rendering approaches, such as mono/stereo/multi-channel modes in order to assist hearing-impaired people in perceiving sound spatialization.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 38% of the total text.

Page 01 of 7

IMPROVING THE PERCEPTION OF NON-SPEECH INFORMATION IN VIDEO CONTENT USING VISUAL REPRESENTATION

AUTHORS:

Venkadesan Marimuthu

CISCO SYSTEMS, INC.

ABSTRACT

    Techniques are presented herein to render audio visualization spectrum as a picture-in-picture (PIP) region (PIP mode) or beneath the captioning region (Inline mode) by dynamically generating the spectrum during real-time playback with different rendering approaches, such as mono/stereo/multi-channel modes in order to assist hearing-impaired people in perceiving sound spatialization.

DETAILED DESCRIPTION

    People with hearing impairment can understand the dialogues of a video by means of closed captioning and sub-titling. But actually non-speech information in the video such as background music, sounds, songs and other sound creates atmosphere, mood, and effects, and provides the narrative.

    Technologies such as surround sound enable the listener to identify the location or origin of a detected sound in direction and distance. But the existing approach of captioning non-speech information such as "audience laughing", "birds are chirping", "lion is roaring" etc. does not give the feeling or effect to people with hearing impairment or to the people watching video in a calm environment like hospitals.

    There is a need for simple and innovative solution to address this problem. People with hearing impairment feel music or sound through vibrations and visual cues. A spectrogram, or sonogram, is a visual representation of the spectrum of frequencies in sound as they vary with time. There are several algorithms (e.g., Fast Fourier Transform) and open source sound visualization libraries available now to convert audio data to representational data.

Copyright 2014 Cisco Systems, Inc.

1


Page 02 of 7

    Presented herein are techniques to show audio visualization along with the video playback on a display device by rendering it in a layer of the graphics plane.

In equipment such as digital set-top boxes or high-definition television sets, it is possible to dynamically create visualization for the decoded audio.

    More specifically, audio visualization spectrums are rendered as a picture-in- picture (PIP mode) or beneath the captioning region (inline mode) by dynamically generating them during the real-time playback with different rendering approaches such as mono/stereo mode/multi-channel modes of visualization spectrum regions to assist hearing impaired people in perceiving sound spatialization.

    In an existing set-top box (STB) architecture, a new component called Audio Visualization Module (AVM) is introduced. The output of the audio decoder is fed back to the Audio Visualization Module that interacts with graphics engine to render the visual representation on the graphics layer.

    FIG. 1 below shows an example of a STB architecture with an Audio Visualization Module.

FIG. 2 below illustrates a block diagram of the Audio Visualization Module.

Copyright 2014 Cisco Systems, Inc.

2


Page 03 of 7

FIG. 2...