Browse Prior Art Database

Building Highly Accurate Non-Parametric Tree-Based Predictive Data Models

IP.com Disclosure Number: IPCOM000021713D
Publication Date: 2004-Feb-04
Document File: 2 page(s) / 38K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method that speeds up gradient boosting tree (GBT) algorithm very significantly (more than 10x sometimes, depending on number of inputs) without sacrificing the predictive accuracy. Moreover, often proposed method learns more accurate models.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Building Highly Accurate Non-Parametric Tree-Based Predictive Data Models

Disclosed is a method that speeds up gradient boosting tree (GBT) algorithm very significantly

(more than 10x sometimes, depending on number of inputs) without sacrificing the predictive accuracy. Moreover, often proposed method learns more accurate models.

Background

A decision tree (DT) is a set of rules that predicts the value of the response variable, based on the values of the prediction variables. The most important aspect of a decision tree is the “split” that is an atomic rule defined on one predictor variable. An example of a split is x > 10 for a numeric variable, and x Î {“red”, “orange”, “magenta”} for a categorical variable. A split has a measure of predictive power called a split weight. The algorithm for learning an optimal decision tree is intractable. The well-known approximation by Breiman et. al. is called a Classification and Regression Tree (CART), and is a simple iterative search. A single iteration step involves constructing the splits with maximum weights on each of the predictor variables, comparing their weights and choosing the one with the highest value. Being the simplest version of a decision tree learning, this algorithm is known for constructing DTs with poor predictive power.

A substantial improvement of CART is the GBT algorithm. Instead of building one large DT, it iteratively builds a set of DTs that work as a committee of experts. The nth DT is learned by explaining the error of the DT committee consisting of n-1 previously built DTs. The resulting committee (also a decision tree) possesses much stronger decision power than a DT learned with CART. However, GBT does exhaustive search on all variables for every tree in the ensemble , and could very computationally expensive, since in real applications the number of predictor variables can be hundreds or even thousands. Also, a typical ensemble size can...