Browse Prior Art Database

Cognitive Recommendation Engine to Compare Datasets, Algorithms, and Performance Metrics Mined from Multiple Data Sources

IP.com Disclosure Number: IPCOM000250199D
Publication Date: 2017-Jun-09
Document File: 4 page(s) / 183K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a recommendation engine to compare datasets, algorithms, and performance metrics. The cognitive method and associated system automatically analyze a given dataset and provide solutions for effective analysis based on both knowledge from the Machine Learning (ML) literature and empirical results of different algorithms over similar datasets.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 46% of the total text.

1

Cognitive Recommendation Engine to Compare Datasets, Algorithms, and Performance Metrics Mined from Multiple Data Sources

Data analytics is an area of high activity, and different solutions are constantly applied to diverse datasets and problems in a large number of domains. Given a dataset, it is not clear how to best analyze it to obtain useful and actionable knowledge.

Proposed herein is a cognitive method and associated system to automatically analyze the dataset and provide solutions based on both knowledge from the Machine Learning (ML) literature and empirical results of different algorithms over similar datasets.

The core novelty is a recommendation engine to compare datasets, algorithms, and performance metrics. This allows a user to obtain recommendations from the cognitive toolkit for the expected performance (from the literature of empirical evidence) of different algorithms given a dataset, or the kinds of limitations of a dataset, when applying a given algorithm (e.g., expect low performance due to the class imbalance).

Specifically, the method and system provide recommendations for the most adequate algorithms to apply as well as the dataset transformations that can lead to performance improvements. The first output of the system includes a list of algorithms and parameter settings along with the expected performance in different metrics (for the given dataset). Some of these algorithms are empirically tested, while in other cases the information is derived from the ML literature. In addition to recommending algorithms, the system suggests transformations of the provided dataset (e.g., increasing the number of instances, subsampling of training data to reduce class imbalances, or performing feature filtering).

The method and system automatically extract the following metadata from Machine Learning literature:

 Dataset Metadata: features relevant for the application of ML algorithms, such as the ratio of features per instance, the distribution of classes, the size relative to publicly available datasets, the split of data in training/development/testing sets, etc. The tool applies Natural Language Processing (NLP) techniques to obtain metadata from written text.

 Algorithm Metadata: features about the characteristics of algorithms that are referred to, such as the methods to tune the parameters, cost functions, feature selection methods, etc. The tool applies NLP techniques to obtain metadata from written text.

 Algorithm Performance Metadata: features about the performance metrics mentioned in the literature, such as precision, recall, accuracy, area under the curve (AUC), etc. The tool applies NLP techniques to obtain metadata from written text.

 Linked Dataset-Algorithm-Performance Metadata from ML literature. The tool relies on the first three sets of metadata and applies NLP techniques to find

2

associations among the relevant entities discussed in the literature. The resulting relations are stored in a Knowledge Base...