Surety is performing system maintenance this weekend. Electronic date stamps on new Prior Art Database disclosures may be delayed.
Browse Prior Art Database

A Multimedia-Based Document Summarization Method and System

IP.com Disclosure Number: IPCOM000198965D
Publication Date: 2010-Aug-19
Document File: 4 page(s) / 100K

Publishing Venue

The IP.com Prior Art Database


With the ever increasing growth of the Web and electronic information services, today, people are overwhelmed by the amount of text information available. In many situations, people are often required to make quick decisions based on their understanding of large collections of documents. To tackle this problem, various document summarization techniques have been proposed. Document summarization techniques have been investigated for years [1]. From the perspective of data sources, it can be categorized into single-document and multi-document summarization approaches [2]. While single-document summarization uses one document source to generate summary, multi-way uses a cluster of documents to generate summary, which becomes more complex and difficult. From the perspective of methods to summarize documents, it can be categorized into extractive and abstractive summarization [3]. The difference between extractive and abstractive methods is whether to modify the sentences in document by fusion or reformulation. Multi-document summarization generates a generic or topic focused summary to reduce documents and retain the document characteristics [4, 5]. Representative ways to summarize documents include snippet based [6], question/answer based and sentences based [5] and etc. Recently, topic sensitive summarization has been used to summary document data collection [4, 5, 7, 8, 9]. All the above text summarization methods target on finding better text resources to represent the documents. In practice, however, it’s still very time-consuming to understand what’s inside a large document collection only by these text summaries. On the other hand, many of the documents contain texts, images and/or videos for describing facts, methods, or telling stories. In addition, there are a lot of images/videos available on the Web. Although these images/videos can be easily understood by an average user, little efforts have been done to leverage these multimedia data to summarize a document or a document collection. Here, we introduce a multimedia-based summarization method which combines both images/videos and text summary.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 41% of the total text.

Page 1 of 4

A Multimedia-Based Document Summarization Method and System

In this part, we describe the techniques we used for multimedia-based summarization in detail.

Method Overview

Step 1: To generate multimedia document summarization, we first collect a data base for text and image/video alignment. The data can be found from tagged images/videos such as Flickr and Youtube, or news stories with multimedia resources, where images and videos are accompanied with surrounding text. Then an indexer is built to easily access these resources, and images and videos are extracted with features, e.g. global color and texture, and etc.

Step 2: We use following steps to generate a multimedia-based summarization for document(s). The input of this step is a document or a set of documents.

2.1 For single document, extract document title as a brief summary. For multi-documents, extract some keywords from the union of the document title.

2.2 Extract keywords from the documents. This can be done using the words with top TFIDF ranking values or using topic model based methods [10].

2.3 Extract key sentences using sentence-based document summarization [4, 5, 7, 8, 9].

2.4 If the document(s) do not contain multimedia resources, we use document similarity measure, e.g. cosine distance, to examine the similarity of given document(s) and the tags or surrounding texts for an image/video. Then retrieve the most relevant images/videos for summarization.

2.5 If the document(s) contain multimedia resources, we first use them as a natural representation of the documents. If we have videos, we should detect key frames as representative images. If we do not extract enough multimedia resources from the given document (s), we can use the methods in 2.4 to find more multimedia data, or use the content similarity of images/videos to retrieve more multimedia data.

2.6 Organize multimedia resources, e.g. keywords, sentences, images and video snapshots. We can assign a keyword that extracted from text to be around an image/video if it also appears in the tag set or the surrounding text of the image/video. We can organize the image/video using the content information, e.g. color and texture to make similar ones to be close. Moreover, more complex methods such as [11] can be used to annotate images and words.

Step 3: The multimedia-based summarization is visualized and the user can interactively understand and consume the summarization result.

3.1 Use information visualization and computer graphics techniques to properly layout the keywords, sentences and multimedia. This step makes the summarization be easily understood by average users.

3.2 Allow the user to freely interact with the generated multimedia summary. In this step, we can let user interact with computer using clicking and flagging as different behaviors.

3.2.1 In a search engine, if a user clicks a relevant summarized document, we enhance the relationship between the words and images/videos used in the summary.