Browse Prior Art Database

Machine Assisted Folksonomic Cross Language Dictionary

IP.com Disclosure Number: IPCOM000202116D
Publication Date: 2010-Dec-04
Document File: 2 page(s) / 20K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method of using information technology to assist in the creation of a dictionary for translating folksonomic categories from one language to another. The dictionary finds use in tag translation, language translation of all types, and software and scholarship related to linguistic and cultural affairs.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 2

Machine Assisted Folksonomic Cross Language Dictionary

Tagging is a good and increasingly popular method of indexing content of all types. The power of tagging is based in a user's tagging the content in the manner that they think it should be indexed; with "voting" for the best indices done visually, by the size of the tag indicating the number of times a common tag has been chosen. Tagging this way is often referred to as a "folksonomy," capturing the way people (i.e., the folk) think about the content.

Folksonomies do not readily translate across languages. For example, a document with a discussion of a song, the artist, and the genre might be tagged by a US, English speaking audience as "Rock & Roll". A German speaking audience might tag the document as "American Music" ("Amerikansiche Musik"), and certainly not a machine translation of "Rock & Roll" into German.

The disclosed invention is a method of using information technology to assist in the creation of a dictionary for translating the folksonomic categories from one language to another. The dictionary finds use in Tag translation, language translation of all types, and software and scholarship related to linguistic and cultural affairs.

The steps to applying the disclosed method include:
1. Gathering of the folsonomic information. In this step the same or similar content is published (e.g., on the Internet) in multiple languages, on the same or separate websites. The content is stored in multiple languages or translated on-the-fly. In one embodiment, both copies are known to a content management system (CMS) which identifies both copies as the "same," but in different languages. The content remains published and is tagged.

For example: There is an English version and German version of a phrase. The users of the English version ("English Users") tag the English version ("English Tags") and the users of the German version ("German Users") tag the German version (German "tags"). There is no machine enforcement that the English tags are in English or the German tags are in German. The tags are whatever the users choose, such being a folksonomy. After some period time, or when some threshold in the number of tags is reached, or by other criteria, the system harvests the tags -- takes the tag information and stores it in a database -- possibly in real-time where harvesting takes place each time a user makes a tag entry. Information stored in the database for the tags could include the number of times each one was chosen or it could be richer information including th...