Dismiss
InnovationQ will be updated on Sunday, Jan. 21, from 9am - 11am ET. You may experience brief service interruptions during that time.
Browse Prior Art Database

# Application of Netezza's Bulk Linear Models for prediction in leaf nodes of regression trees.

IP.com Disclosure Number: IPCOM000237557D
Publication Date: 2014-Jun-24
Document File: 1 page(s) / 20K

## Publishing Venue

The IP.com Prior Art Database

## Abstract

Disclosed is a method for distributing the data on nodes according to a regression tree structure. The data prepared in this manner may be used by e.g. Netezzaâ€™s Bulk Linear Models in order to build parallelly a large number of linear models in an efficient way.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 100% of the total text.

Page 01 of 1

Application of Netezza ' '''s Bulk Linear Models for prediction in leaf nodes of regression

s Bulk Linear Models for prediction in leaf nodes of regression trees.

Regression tree is a common technique used in predictive analysis. Leaf nodes of such trees usually contain constant value predictors or linear models. The second approach is more accurate, however it requires a larger computational effort in order to build as many linear models as there are leaf nodes in the regression tree.

We propose to use Netezza's Bulk Linear Models technique for building a large number of linear models on data that was distributed according to a regression tree structure. Our approach consists of the following steps:

Build a regression tree with a specified target attribute.

1.

Redistribute the data according to the tree structure, so that the rows associated

2.

with each leaf node are stored within a single node.

For each data chunk associated with one leaf, assign a separate id_task value for

3.

Bulk Linear Models algorithm.

Execute Bulk Linear Models parallelly on each data chunk.

4.

Store each linear model in its corresponding leaf.

5.

In this way a set of linear models was built in an efficient and parallel manner using the data distribution based on the regression tree topology.

1