Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method for recognizing and classifying video text and video characters using kernel-space methods

IP.com Disclosure Number: IPCOM000010450D
Original Publication Date: 2002-Dec-03
Included in the Prior Art Database: 2002-Dec-03
Document File: 4 page(s) / 52K

Publishing Venue

IBM

Abstract

Disclosed is a system and method for recognizing text embedded on video frames. Videotext refers to text superimposed on still images and video frames, and a videotext based Multimedia Description Scheme has recently been adopted into the MPEG-7 standard as one of the normative media content description interfaces. While much of the previous work including ours concentrates on the task of locating and extracting text from the video frames automatically, very little research has focused on reliably recognizing segmented text. The low resolution of videotext, unconstrained font styles and sizes, poor separation of characters often resulting from video compression and decoding, all pose severe problems even to commercial OCRs in recognizing videotext accurately. Disclosed is an end-to-end video character recognition system featuring new character attributes emphasizing macro shapes, a Support Vector Machine-based character classifier, videotext object synthesis, font context analysis, and temporal contiguity analysis, to successfully address issues confounding accurate videotext recognition. We present results from our experiments with real video data that demonstrate the strengths of this system.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 15% of the total text.

Page 1 of 4

  Method for recognizing and classifying video text and video characters using kernel-space methods

    Disclosed is a system and method for recognizing the text embedded on video frames. This includes the steps of locating the text in a video frame, extracting it from the other contents of the frame, and processing it for feature extraction if necessary, recognizing which characters it contains, and correctly classifying and labeling the characters to generate ASCII strings. Our invention involves construction of kernel space methods, especially support vector machines to learn the characters using training from measurements obtained from text automatically located in a video frame. The trained support vector machines are then tested with unknown input measurements, in order to determine which of the test outputs is the character that was originally present. This system produces better generalization with recognition of untrained test data. When the machine, or its combinations is presented with measurements of an unknown character, it recognizes the character and determines its label with far greater precision than other techniques such as neural networks. It also handles the low resolution of the data. We are the first ones to use kernel space methods using a machine learning approach to solve the problem of video character recognition.

Videotext Recognition

The videotext extraction and recognition task comprises of obtaining images/frames (if input is a compressed video, by decoding an MPEG-encoded stream), segmenting the image and extracting regions containing characters only (sometimes referred to as text location), and finally recognizing the characters using character recognition (CR or OCR) systems to output the text strings in the images.

The text extraction process is motivated by the fact that traditional CR systems require text to appear against a clean background for accurate recognition; therefore, the goal is to remove the background in video frames preserving only the text characters, to be finally output in binarized form. We have developed a system which extracts and analyzes regions in a video frames to yield OCR-ready character images [Shim, Dorai, Bolle, 1998]. Other videotext extraction systems are based on methods that perform edge analysis or texture processing. These differ from one another typically in terms of their sensitivity to font sizes and styles, restrictions on the appearance characteristics of text that can be handled, limitations on the type of text that can be extracted (e.g., captions only), and the ability to handle normal and inverse video modes of text.

Most text extractors do not attempt automatic recognition of the segmented text, and if they do, they typically use traditional OCRs whose performances are rather attuned to handle printed or scanned document-characters with very high resolution. Therefore, the results obtained typically show poorer recognition accuracy with videotext. Unlike sc...