Browse Prior Art Database

Using distributed data on web 2.0 media to make inferences about the real world

IP.com Disclosure Number: IPCOM000193834D
Publication Date: 2010-Mar-10
Document File: 2 page(s) / 46K

Publishing Venue

The IP.com Prior Art Database

Abstract

There exists a great deal of information of interest within the wider world which is currently unavailable: for example, the movements of prominent people such as politicians. By parsing data freely available via social media such as Twitter, it is possible to work out the diaries/movements of such people. Data could be sourced from elsewhere, for example blogs, but Twitter and its kin are particularly pertinent, as the small quantities of data associated with each update are easy to analyse. This can currently be done 'by hand', but requires many laborious searches. We propose a tool to automate the process. The novelty of this idea comes from various facets: analysis of data's provenance based on its authorship; applying NLP to raw data sets; contacting authors for clarification or further information; working with live data.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 55% of the total text.

Page 1 of 2

Using distributed data on web 2.0 media to make inferences about the real world

There exists a great deal of information of interest within the wider world which is currently unavailable: for example, the movements of prominent people such as politicians. By parsing data freely available via social media such as Twitter, it is possible to work out the diaries/movements of such people. Data could be sourced from elsewhere, for example blogs, but Twitter and its kin are particularly pertinent, as the small quantities of data associated with each update are easy to analyse.

    This can currently be done 'by hand', but requires many laborious searches. Proposed is a tool to automate the process. The novelty of this idea comes from various facets:
* Provenance of data based on its authorship is analysed
* Natural Language Processing (NLP) is applied to the raw data sets
* It is possible to contact authors for clarification or further information
* Live data is used

    As an example, the Foreign Secretary's diary is not published in advance. However, if people publish microblog entries on Twitter about meeting him, it is possible, by tracking those Tweets, to work out where he has been and for what reasons.

    By using the location data of people publishing microblog entries, and the content of those entries, it is possible to infer the prior location of the people being discussed -- and thus plot the paths of those people.

    Note that location is only one example: another might be to consider relationship data (who is with whom at the moment?) or data on people's activities (is the Foreign Secretary attending a business lunch, or is he working on his laptop?). However, location is an easy example to work with and demonstrate here, as locations tend to have easily-identifiable names.
1) The user types the name of the person they wish to track (e.g. 'John Smith'), and any synonyms of this (e.g. 'Prime Minister')
2) The system monitors Twitter (and like systems) for any Tweets including these phrases. Of course, not every Tweet which mentions 'John Smith' will be saying 'I met John Smith!': indeed, only a minority will say this. However, Tweets which do content this text will be parsed for appropriate words ('saw', 'met', 'spoke with', 'spoke to' etc). An amount of NLP will be applied, so for example 'John Smith spoke with' can be discarded, but 'I spoke with John Smith' will be flagged up as relevant.
3) The system notes the user profile of anyone who has produced a flagged up Tweet, including their stated location
3) The system outputs flagged up Tweets, alongside the associated user profiles
From th...