Test Coverage of a natural language corpus
Publication Date: 2014-Apr-29
The IP.com Prior Art Database
Disclosed is a method for selecting an effective subset of a corpus for natural language processing (NLP) pipeline testing. Each passage in the corpus is scored against a vector of interesting attributes such that Combinatorial Test Design (CTD) style reduction can then be applied to identify the optimum subset of documents to include in the test.