Browse Prior Art Database

Enhanced generation of question variants for improved training of a statistical question answering system

IP.com Disclosure Number: IPCOM000244347D
Publication Date: 2015-Dec-03
Document File: 2 page(s) / 26K

Publishing Venue

The IP.com Prior Art Database

Abstract

Described is a method for rapidly and automatically generating question-answer pairs for training a statistical question answering system. Taking as input a seed question-answer pair, the method generates meaning preserving variants of the question through syntactic transformation and lexical substitution.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 2

Enhanced generation of question variants for improved training of a statistical question answering system

Disclosed is a method for generating question-answer pairs suitable for training a statistical question answering system. Statistical question answering systems require a lot of training data in the form of question-answer pairs in order to develop robust and accurate models. Generating these question-answer pairs is an expensive manual process. The current method generates multiple variants of a question while leaving the answer unchanged. This allows the generation of a large collection of training pairs from a small seed set. The variation stems from two sources: lexical substitution by replacing question terms with synonymous terms and syntactic transformation by modifying the sentence structure in meaning-preserving ways.

A) Lexical substitution: Replacing question terms with synonymous terms.

    1) Q: Who wrote the book 'No God but God'? A: Reza Aslan 2) Q: Who authored the book 'No God but God'? A: Reza Aslan 3) Q: Who penned the book 'No God but God'? A: Reza Aslan
Starting from the question-answer pair in (1) we can replace the verb 'wrote' with synonymous terms 'penned' and 'authored' resulting in the Q/A pairs in (2) and (3). Synonyms like these can be sourced from a published thesaurus like Roget's or from a machine-readable semantic network like WordNet.

B) Transforming the syntax of the input question in meaning-preserving ways.

    This requires parsing the input and transforming it according to a set list of meaning preserving syntactic transformations. In some cases, these are purely syntactic transformations like passivization, where the direct object of a verb becomes its subject, and the subject is introduced by the preposition 'by'. The passivization rule is shown in (4). This kind of transformation results in the Q/A pairs in (5)-(7).

4) Who VERBED OBJECT? ==> OBJECT was VERB-passive by whom?

    5) Q: 'No God but God' was written by whom? A: Reza Aslan 6) Q: 'No God but God' was authored by whom? A: Reza Aslan 7) Q: 'No God but God' was penned by whom...