Browse Prior Art Database

System that detects correlations within unstructured information Disclosure Number: IPCOM000236605D
Publication Date: 2014-May-05
Document File: 2 page(s) / 138K

Publishing Venue

The Prior Art Database


This article shows how to identify correlations in unstructured data when questions are posed to a question answering system. The system determines the correlations, then presents the answer to the user in graph form appropriate for the type of question asked.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 2

System that detects correlations within unstructured information
Disclosed is a method to take a natural language question as input and return a set of candidate answers to that question, in the form of text. Given the ability of a system such as IBM Watson* to work on natural language input and on a large corpora of data it could be applied to discovery of new correlations that could be hidden in natural language and present them in a visual manner. Discovery of these correlations would be very useful in research and analysis. The research could be research of any kind. Medical research on a new drug, fact finding research for a news story, source research for a term paper, etc. This could apply the system (with some direction) to discover new correlations within the data in its corpus. This method looks for new insights and relationships and not known facts.

Currently there is no known solution that can sift through unstructured data using Natural Language Processing (NLP) and then find data points that are to be graphed. Wolfram Alpha** can do this with structured information. One example of this would be: cy

Currently Watson returns answers in a textual form, however in some cases that may not be the best way to either present the answer or that may not even be what the asker is looking for. If the user is looking for the result of the question to be a set of data, then returning the information in graphical form could be far more beneficial.

Consider the following question/answer flow
User inputs a question: 'What is the relationship between a person's weight and their life-span?'





Some sample questions that would be best represented visually as a graph: What is the relationship between a person's weight and their life-span? How does Derek Jeter's hits per year compare to Pete Rose?

What kind of drug interactions occur when using Drug A, B, and C? What was the average price of gas in the US in 2011?

The component that is searching through the unstructured information and identifying each candidate answer and then transforming it into a visual representation is the heart of the invention. This component relies heavily on an NLP pipeline such as IBM Watson and can be implemented with simple extensions.

The extensions can be implemented as set of new annotators which search specifically and indirectly f...