Browse Prior Art Database

Test Coverage of a natural language corpus

IP.com Disclosure Number: IPCOM000236478D
Publication Date: 2014-Apr-29

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for selecting an effective subset of a corpus for natural language processing (NLP) pipeline testing. Each passage in the corpus is scored against a vector of interesting attributes such that Combinatorial Test Design (CTD) style reduction can then be applied to identify the optimum subset of documents to include in the test.