Browse Prior Art Database

Information Gap Prediction for Ingested Media

IP.com Disclosure Number: IPCOM000241713D
Publication Date: 2015-May-26
Document File: 3 page(s) / 80K

Publishing Venue

The IP.com Prior Art Database

Abstract

A system and method for information gap prediction for ingested media in a system capable of answering questions is disclosed.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 01 of 3

Information Gap Prediction for Ingested Media

Disclosed is a system and method for information gap prediction for ingested media in a system capable of answering questions.

In a system capable of answering questions, or an unstructured information system, the information sources are the critical component to get reliable results. These information sources are generated by one or more authors, who are continuously writing, or preparing to post content.

For academics or researchers in a particular field, such as computer science, these updates and posts are related to set number of subjects or contexts. When there is a gap in their contributions, there is a gap in the knowledge available to make decisions. These gaps may be due to the academic working on new materials, vacations, or various other reasons.

For consumers and users, such as bloggers or tweeters, these updates and posts are often related to an indeterminate number of subjects and contexts. Where there is a gap in their contributions, the insight into the user during that time is highly variable. These gaps may be due to vacations, or dissatisfaction in the provider service. (for example, no longer Tweeting)

An example scenario:

Alice is the author


Alice posts 10 tweets spread over one week. The pattern is repeated for 10 weeks.

Alice doesn't post the 10 tweets in the 11th week.

There is a lack of information. There is a clear need to identify gaps in available information in the analysis of a corpus.

The disclosed method predicts and weights the information gaps and spikes, by:

Identifying the author of each document in a corpus.


a.

Establishing a timeline of the author's documents


b.

Determining gaps and spikes in the authors documents.


c.

Acting upon the gaps or spikes.

Each document may correspond to a document, paper, mail, news article, comment, a tweet, video, podcast or wiki page. The disclosed method identifies the at least one author of a document. The disclosed method may identify the need to acquire more data related to an author, due to the information gap.

In an example embodiment, Alice is the administrator for an corpus.

[Document A | Bob | 1 Page | 1/2/2010] [Document B | Bob | 2 Page | 1/1/2010] [Document C | Bob | 1 Page | 1/1/2010]

The system loads Document A and Document B and Document C.

The system extracts the authorship data, and adds Bob to the look up table.


d.

1


Page 02 of 3

The system plots the time line for Bob.

    3 Page - 1/1
1 Page - 1/2
The system detects a spike on 1/1
The system normalizes the spike of writing on 1/1, and only ingests half the documents.

Extending the gap, now 1/1 and 1/3 the gaps show there is a gap between the documents and flags that more should be added from the give...