Browse Prior Art Database

System T and Analysis of Questions to Produce Answer Units

IP.com Disclosure Number: IPCOM000243663D
Publication Date: 2015-Oct-08
Document File: 1 page(s) / 21K

Publishing Venue

The IP.com Prior Art Database

Abstract

Described is a method to automatically isolate and return an answer snippet, given a question and a document that contains the answer. Rather than require a Subject Matter Expert to parse through every relevant document by hand, this method automatically processes the question and finds the sentences containing the answer in the document based on the expected answer structure.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 72% of the total text.

Page 01 of 1

System T and Analysis of Questions to Produce Answer Units

Given a set of questions and a document that contains relevant answers to the questions, it is often cumbersome to manually match questions with answers. Not only does it require a great deal of time from a Subject Matter Expert (SME) to identify appropriate answers, but these documents must be reformatted to give every section a header so they can be recognized as potential answer units. A corpus with more than a handful of documents will grow non-linearly in complexity and time required, and increase the likelihood that SMEs will miss answers across multiple documents.

    Disclosed is a method that processes the questions and document text to automatically find answers to questions within the corpus. A representative set of words and phrases that indicate question intent (who, what, where, when, who, why, how, can) are pre-collected, and partially formed questions are categorized into the appropriate grouping. For example, 'symptoms of the flu' is a partially formed version of 'what are the symptoms of the flu' and would be filed into the 'what' group.

    These groupings are further partitioned based on the 'answer trigger' that indicates a good match to the question. Because "what is a..." is inherently looking for a different answer type than "how many months...," building the extractors based on the expected part of speech, count, or unit element provides a more reliable process to isolate the appropriate...