Browse Prior Art Database

Live video captioning using relative hand/body distance detection

IP.com Disclosure Number: IPCOM000239518D
Publication Date: 2014-Nov-13
Document File: 2 page(s) / 33K

Publishing Venue

The IP.com Prior Art Database

Abstract

Adding text captions to video is time consuming and can be hard work, especially if you want effects such as a caption appearing over a speaker's outstretched hand at a particular point and then moving with their hand, as if being dragged, to another location on the display. This article describes a method by which captions can be captured using voice recognition, with the trigger for recording being the relative distance of a speaker's hand from their body.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 55% of the total text.

Page 01 of 2

Live video captioning using relative hand/body distance detection

Problem: Adding text captions to video is time consuming and can be hard work, especially if you want effects such as a caption appearing over a speaker's outstretched hand at a particular point and then moving with their hand, as if being dragged, to another location on the display.

    Current solutions: Take video as normal, moving hands into the approximate places the captions will go as the video is being filmed by following some pre-planned physical action script. Captions then added in post-production need to follow hand movement accurately, mimicking speed and direction.

    Drawbacks of existing solution: requires significant pre-planning and post-production to create smooth looking captions that match the participant's movement.

    Solution proposed: Use a 3D camera, or other mechanism of distance detection with the capability for face/hand recognition, to capture the participant(s) while filming them. If they move their hand out a certain distance from their body, start translating their recoded voice using speech-to-text to add a text caption overlaid on the video. Caption should appear dynamically as they speak so they can see the output as they create it. If the hand moves while extended from the body, the caption should track that movement. When the hand is withdrawn below the threshold distance from the body, the caption remains where it is and editing of that caption is complete.

    The advantages over existing techniques are
- captions can be added dynamically while speaking and will follow the movement of the speaker, rather than the speaker having to pre-plan their movements.

     - as captions are added dynamically, the speaker can view them on a screen as they are being recorded, so they can see how/where the captions will appear in the final cut.

    - zero or reduced post-processing is required, as captions have already been added and moved as required. To aid post-processing, if required, it may be possible to store the captions as separate objects overlaid on the video stream so they can still be edited later.

    The novelty of this disclosure is using relative hand/body position to trigger speech-to-text conversion that adds text...