Browse Prior Art Database

Improve automatically generated subtitles using character recognition

IP.com Disclosure Number: IPCOM000243520D
Publication Date: 2015-Sep-29
Document File: 1 page(s) / 32K

Publishing Venue

The IP.com Prior Art Database

Abstract

This invention considers the picture as well as the video in forming the subtitles - in particular using optical character recognition to parse out words that appear in the video picture, and so consider them as more likely to be in the audio - and so more likely to be matched in any speech that subtitles are being generated for.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 58% of the total text.

Page 01 of 1

Improve automatically generated subtitles using character recognition

Videos uploaded to sharing sites like YouTube can have subtitles automatically generated for them to improve accessibility using speech recognition.

    The results of this can be inconsistent, and when the content contains specialised language the recognition software used can find it difficult to transcribe the speech.

    This invention considers the picture as well as the video in forming the subtitles - in particular using optical character recognition to parse out words that appear in the video picture, and so consider them as more likely to be in the audio - and so more likely to be matched in any speech that subtitles are being generated for.

    This invention considers the picture as well as the video in forming the subtitles - in particular using optical character recognition to parse out words that appear in the video picture, and so consider them as more likely to be in the audio - and so more likely to be matched in any speech that subtitles are being generated for.

This would be particularly useful for technical videos or presentations, where

there may be explanatory visual material or 'slides' as well as speech - the OCR can parse this visual content for particular known words, and since they appear in the video at this point can be considered likely to be in the audio at this point too.

    This means that if these technical terms are being used, and the speech recognition software matches th...