Browse Prior Art Database

A Method and System of Topic Words Detection Based on Image Evidence

IP.com Disclosure Number: IPCOM000191333D
Original Publication Date: 2009-Dec-30
Included in the Prior Art Database: 2009-Dec-30
Document File: 4 page(s) / 151K

Publishing Venue

IBM

Abstract

Clustering is a normal methodology to mine deeper subtopics on a big set of documents. Lots of algorithms were developed to achieve a good clustering result. In addition, one needs to get a small set of representative keywords to represent the cluster, and help to catch the meaning fast.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 62% of the total text.

Page 1 of 4

A Method and System of Topic Words Detection Based on Image Evidence

The whole process of selecting top words by simply analyzing images is as the following diagram:

Figure

1. The whole process of our invention

Followings are the key components and their functions in the system:

Image evidence collection:

1

[This page contains 1 picture or other non-text object]

Page 2 of 4

Figure

2

Image collection

(

part

)

on "Explosion"

Step 1. Search in the image search engine by presenting the keywords as query words

2

[This page contains 1 picture or other non-text object]

Page 3 of 4

Step 2. Extract salient features from each of the images. The objective of this step is to form the feature space for following calculation. ¡§ A bunch of image processing and feature detection algorithms can be applied, for example,
local feature descriptor (such as texture feature detector, SIFT descriptor, DoG feature, wavelet filtering, etc.)
color feature descriptor (such as hue-saturation, color histogram, etc.) .
¡§ Form a comprehensive image descriptor for each of the images.

Step 3. Construct the pair-wise distances matrix n

j

dist ,...,

1

,

(, )]

=[ = (eg. KL divergence, cosine distance) of the images.

D I

dist

I

i

j

i

D

is the image cluster and n is

the total number of images in

D.

Step 4. Calculate the distance of each image i

I to the cluster

D

.

     dist D

(

I...