Browse Prior Art Database

Folksonomy-based Keyword Extraction for tag-cloud generation Disclosure Number: IPCOM000200579D
Publication Date: 2010-Oct-19

Publishing Venue

The Prior Art Database


Please enter 2-3 line Abstract

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 7% of the total text.

Page 01 of 26

A tag-cloud

        is a visual depiction, typically used to provide a visual summary or a semantic view of an item or a cluster of items that have something in common (e.g., the search results for a specific query). Recently, tag-clouds have been popularized by leading social media sites such as Delicious, Flickr, Technorati , and many others, to become a standard visualization tool for content representation on social media sites.

Obviously, meaningful, high-quality tag-clouds can be generated in


successfully represented by the tag-cloud that is based on its own tags, or on tags associated with similar items. On the contrary, existing tag-cloud generation techniques have difficulty in generating good representative tag-clouds for items in sparsely tagged domains.

    When manual (user-provided) tags are not available, feature selection techniques can be used to extract meaningful keywords from the item's content, or from other textual resources that are related to the item such as anchor-text or the item's meta-data. These extracted keywords can be used as alternative tags for the manual tags. However, extracted keywords are usually inferior to manual tags since significant keywords, from a statistical perspective, do not necessarily serve as good labels for the content from which they were extracted. In the following we term extracted keyword based tag-clouds as word-clouds.

In this work we suggest a novel approach that enhances keyword selection methods for word-cloud generation. Our approach, termed tag-boost, promotes keywords in the item's description that are frequently used to tag items by the public. Keywords are selected from the item description, according to statistical selection criteria, and additionally according to their relative frequency in the tag-based folksonomy. Thus, keywords that people frequently use to tag content are boosted compared to keywords that are not frequently used as tags.

In a nutshell, our approach for word-cloud generation works as follow.

keywords are extracted from the the item's description using any keyword extraction technique (e.g. tf-idf, chi-square, MI, KL; see attached paper below for reference). Then, the weight of each extracted keyword is boosted by a tag-boost score that reflects the probability of that keyword to be used as a tag. This probability is estimated from the folksonmy of tags (the taxonomy of all tags assigned to items in the collection). Then, keywords with higher boosted scores are selected for the



    For a cluster of items, our method extracts for each item in the cluster a list of keywords using the tag-boost approach. Then, these extracted keywords are integrated to a word-cloud by any existing aggregation method.

    The folksonomy used for tag-boosting can be imported from any external domain and is not limited to the domain-based folksonomy that we deal with,


might be poor or noisy. Thus, this method can be applied to any content,...