Browse Prior Art Database

System and Method of Corpus-Specific Optimization of Rule Sets from a Common Rule Set Based on Accuracy Preferences

IP.com Disclosure Number: IPCOM000249303D
Publication Date: 2017-Feb-16
Document File: 3 page(s) / 651K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed are a system and method of dynamically generating an optimized rule set for Natural Language Processing, based on accuracy preferences, for a given corpus from a common, aggregate set of rules. Consumers can optimize by recall, precision, or F1 (weighted average of recall/precision).

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

1

System and Method of Corpus-Specific Optimization of Rule Sets from a Common Rule Set Based on Accuracy Preferences

Customizing Natural Language Processing (NLP) rule sets for a given domain/corpus is a challenge. Unless the rule sets are forked or a user creates a new set of rules for each domain/corpus, a common rule set is typically applied to numerous corpora. A one-size-fits-all approach is oftentimes ineffective in addressing the variances in linguistic subtleties and nuances between corpora. A rule written to improve accuracy in 1...* corpora may very well adversely impact the accuracy of 1...* other corpora. It is quite difficult to both measure and manage the impact of a given rule or set of rules on a broad distribution of corpora. For these reasons, rule-based NLP assets can be viewed as being brittle in nature.

The novel solution is a system and method of dynamically generating an optimized rule set, based on accuracy preferences, for a given corpus from a common, aggregate set of rules. Consumers can optimize by recall, precision, or F1 (weighted average of recall/precision).

The solution comprises methods for: · Generating rule subsets from a common rule set to optimize designated

accuracy preferences (recall, precision, F1) · Generating alterations within rules from a common rule set to optimize

designated accuracy preferences · Exhaustively generating variations of the first and second methods, above, to

derive rule sets that are optimized to each corpus in accordance to designated accuracy preferences

Figure 1: Overview diagram

1. Common Rule Set The common rule set is a common NLP rule set in which the method can aggregate rules to address disparate accuracy issues across a multitude of corpora. The process optimizes rule sets for each corpus, which reduces

2

concern over a rule added to address an issue in one corpus adversely impacting 1...* other corpora. The optimization engine exhaustively adjusts/filters a set of rules to optimize a rule set for the designated accuracy preferences for a given corpus.

2. Rule Set Optimization Engine The optimization engine consults the accuracy preferences for a given corpus (i.e., favor recall, precision, or F1) and iteratively executes the rule sets against the ground truth, toggling which rules to execute, generating variations of the canonical rules, and ultimately defining/generating the rule set yielding the best results in accordance with the designated accuracy preferences for a given corpus.

Sample Generated Rule Variations 1. Modify the rule quantifiers. For example: {Entity A} --> {Entity A}0-1 //

Make Entity A optional for a given rule. 2. Modify the order of rule components. For example: {E...