Automatic Text Extraction from Images and Video for Content-Based Annotation, Search, and Retrieval
Original Publication Date: 2002-Nov-13
Included in the Prior Art Database: 2002-Nov-13
Text either embedded or superimposed within images and video frames is very useful for describing the semantic content of the frames, as it enables both keyword and free-text based search, automatic video logging, and video cataloging. Extracting text directly from video data becomes especially important when closed captioning or speech recognition is not available to generate textual transcripts of audio, or when video footage that completely lacks audio needs to be automatically annotated and searched based on frame content. Towards building a video query system, we have developed a scheme for automatically extracting text from digital images and videos for content annotation and retrieval. In this paper, we present our approach to robust text extraction which can handle complex backgrounds in video frames, deal with different font sizes, font styles, and font appearances such as normal and inverse video. Our algorithm results in segmented characters from video frames that can be directly processed by an OCR (optical character recognition) system to produce ASCII text. Results from our experiments with over 5,000 video frames demonstrate the good performance of our system in terms of text identification accuracy and computational efficiency.