Method of Computing Relevancy Score in a Question and Answering System
Publication Date: 2012-Oct-02
The IP.com Prior Art Database
Disclosed is a method for computing the relevancy of answers in a question and answering system.
Page 01 of 3
Ȉ ˇ ˄ ˇ˙ ˇ
The advent of new technologies like those present in the IBM Watson* solution are enabling a new class of solutions that are able to derive knowledge from unstructured data and use that knowledge to advise on matters spanning a broad set of use cases from a number of different industries. Systems such as IBM Watson work by generating a large number of hypothesis (candidate answers for a given question) and then score how likely each hypotheses is to be the correct/best answer using a variety of natural language processing techniques. Techniques for scoring candidate answers are typically an assortment of natural language processing algorithms that look at things such as how well the terms found in an answer match those in the question, whether the candidate answer is expressed in the same logical form as the question, whether the candidate answer is of the same lexical answer type as that expected by the question (e.g., question is asking for a disease - is the answer a disease?) and a number of other techniques that involve analysis of unstructured text. In the end, machine learning techniques are used to compute a final confidence score for each candidate answer based on the individual scores of each technique.
In many solutions involving a question and answering system such as IBM Watson, there is context data present, in addition to the question being asked, and this data can and should be used to help score and potentially filter candidate answers. The context data helps to determine how relevant each candidate answer is to the question being asked.
As an example, a question and answering system may be designed to determine whether a particular medical procedure is medically necessary or not for a patient. To determine the answer in this case, the evidence used would be a set of clinical guidelines and medical policies. Medically policies contain criteria that must be met for procedures to be medically necessary and are often broken up by type of procedure and/or diagnosis. The context data in this case would be the requested procedure and/or the primary diagnosis of the patient.
This invention proposes a new technique for scoring candidate answers in a question and answering system based on relevancy of the context data to each candidate answer. The relevancy score would be calculated by identifying concepts in the candidate answer text and comparing to the concepts in the context data. Comparison could be implemented by:
Exact or fuzzy text match between context data and candidate answer text
Concept matching between context data and candidate answer text
How close the concepts are related in a given ontology
Below is an example question that could be sent to a question and answering system. In
this case, a specific treatment is being requested and the correct answer is that the treatment is "Medically Necessary."
Coverage of laparoscopic wedge resection of hepatic metastasis from breast cancer has been req...