Application of Netezza's Bulk Linear Models for prediction in leaf nodes of regression trees.
Publication Date: 2014-Jun-24
The IP.com Prior Art Database
Disclosed is a method for distributing the data on nodes according to a regression tree structure. The data prepared in this manner may be used by e.g. Netezza’s Bulk Linear Models in order to build parallelly a large number of linear models in an efficient way.
Page 01 of 1
Application of Netezza ' '''s Bulk Linear Models for prediction in leaf nodes of regression
s Bulk Linear Models for prediction in leaf nodes of regression trees.
Regression tree is a common technique used in predictive analysis. Leaf nodes of such trees usually contain constant value predictors or linear models. The second approach is more accurate, however it requires a larger computational effort in order to build as many linear models as there are leaf nodes in the regression tree.
We propose to use Netezza's Bulk Linear Models technique for building a large number of linear models on data that was distributed according to a regression tree structure. Our approach consists of the following steps:
Build a regression tree with a specified target attribute.
Redistribute the data according to the tree structure, so that the rows associated
with each leaf node are stored within a single node.
For each data chunk associated with one leaf, assign a separate id_task value for
Bulk Linear Models algorithm.
Execute Bulk Linear Models parallelly on each data chunk.
Store each linear model in its corresponding leaf.
In this way a set of linear models was built in an efficient and parallel manner using the data distribution based on the regression tree topology.