Browse Prior Art Database

Construction of Optimal Binary Split Trees

IP.com Disclosure Number: IPCOM000041436D
Original Publication Date: 1984-Jan-01
Included in the Prior Art Database: 2005-Feb-02
Document File: 3 page(s) / 42K

IBM

Related People

Huang, SH: AUTHOR [+2]

Abstract

Split trees are a new technique for storing records with skewed frequency distribution, where the conflicting functions of frequency ordering and subtree construction are decoupled. A polynomial time algorithm to construct optimal binary split trees is described below. The crucial idea in such trees is that two values are stored in each node, one of them being the key, the other being a split value. The split value is used as a guide for further search in the tree if the key value is not equal to the search value. Note that the record associated with the split value is not stored in the node but in one of its subtrees. Given a set of n keys k1, ..., kn and their access probabilities p(k1), ...

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 3

Construction of Optimal Binary Split Trees

Split trees are a new technique for storing records with skewed frequency distribution, where the conflicting functions of frequency ordering and subtree construction are decoupled. A polynomial time algorithm to construct optimal binary split trees is described below. The crucial idea in such trees is that two values are stored in each node, one of them being the key, the other being a split value. The split value is used as a guide for further search in the tree if the key value is not equal to the search value. Note that the record associated with the split value is not stored in the node but in one of its subtrees. Given a set of n keys k1, ..., kn and their access probabilities p(k1), ..., p(kn), we can define a binary split tree formally as a set T on n nodes such that (1) each node N contains two values from S, the key, key(N) and the split value, split(N); (2) there is a specially designated node R, called the root such that p(key(R)) > p(ki) for i = 1, ..., n; (3) the remaining nodes (excluding the root) are partitioned into 2 disjoint sets TP and Tr and each of

these sets in turn is a binary split tree; and (4) for all nodes NP e TP and Nr e Tr, key (NP) < split(R) < key(Nr). An optimal binary split tree of S is a binary split tree that minimizes the internal path length of all binary split trees of S. The main advantage of separating the split value from the key value is the ability of "decoupling" the conflicting functions of frequency ordering and subtree construction 1 . Searching a value v in a split tree can be done in a similar way as in an ordinary binary tree.

(Image Omitted)

By separating the split value from the key value, it is possible to store the most frequent key in the root of a tree without putting constraint on the structure of the subtrees. It is assumed that the access frequencies are distinct so that the selection of the most frequent key in each subtree is unique. There are several ways the split value can be selected. Different choices of the split value will result in different tree structures. An optimal split tree with the 31 most frequently used English words is given in the figure. As in the case of binary search trees, optimal split trees are trees with minimum weighted path lengths. The algorit...