Browse Prior Art Database

System and Method for Converting a Video into Informative Images

IP.com Disclosure Number: IPCOM000236832D
Publication Date: 2014-May-19
Document File: 5 page(s) / 401K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a system and a method for converting a video file into a sequence of single images. The system produces an image whenever the embedded voice recognition software detects a spoken sentence. Then a dialog will be rendered on this image and it not only contains the text script that was translated by the voice recognition from the audio signal, but it will also point out the person who spoke these words.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 5

System and Method for Converting a Video into Informative Images

    The widespread of consumer electronics devices for digital recording produces a huge amount of video clips. A video clip comprehensively captures all visual and auditory information, however, people may be interested only on certain part of them, for example, the conversations among people. This innovation is seeking for an efficient and effective way to represent the content of a video using several "representative" images. There are some existing video summarization/synopsis techniques to shorten a video, but the invention would be especially focused on the "dialogues" among people within a video. A system and method is proposed to generate informative images from a video, each of them will be related to a dialogue. Through the utilization of facial recognition, voice recognition, and speech to text technologies, the system will produce images with text script rendered near the speaker. This could be a useful way to visualize the dialogues in the video, also it can be used for entertainment purpose which enriches a photo through showing the content of conversations among the persons in the photo.

    The invention relates to converting a video file into a sequence of single images. The system produces an image whenever the embedded voice recognition software detects a spoken sentence. Then a dialog will be rendered on this image and it not only contains the text script that was translated by the voice recognition from the audio signal, but it will also point out the person who spoke these words.

The advantages of the present invention includes but not limited to:

    1. By leveraging face recognition and voice recognition techniques, the present invention can automatically render text script close to the speaker. For videos that focus mainly on conversations, this invention will save a great deal of post-editing effort, possible applications might include:
- provide video summary: the system can be used to generate meeting minutes from meeting recordings. Besides, appending a subset of the images beside the video can provide quick ideas of the video content to the video viewers.

- With automatic summarization technologies that extract video most valuable parts, the system can present the core idea of the video with less images. These images can be used for activity promotion (ex: movie or TV program promotion).
- for entertainment purpose, the system can be implemented as a fun application to transform peoples' short interesting videos into comics

    2. Typically images can be stored, transmitted, and manipulated more quickly and efficiently than video files. For some areas with limited network bandwidth, this invention can reduce the time transferring the information by converting the most essential part of the video (has human voice) into images.

     Description of Invention:The present invention can be implemented in this way. The system has a predefined b...