Browse Prior Art Database

# Dynamic Binary Tree Space Allocation for a Sort

IP.com Disclosure Number: IPCOM000035773D
Original Publication Date: 1989-Aug-01
Included in the Prior Art Database: 2005-Jan-28
Document File: 2 page(s) / 13K

IBM

## Related People

Chang, PY: AUTHOR [+3]

## Abstract

Many sorts build a pointer structure to data being sorted. It is important to have a good ratio between the amount of space for the pointer structure and amount of space for the data records.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 2

Dynamic Binary Tree Space Allocation for a Sort

Many sorts build a pointer structure to data being sorted. It is important to have a good ratio between the amount of space for the pointer structure and amount of space for the data records.

The OS/2* Extended Edition Database Manager Sort/List Component requires two buffer areas during insert phase. The first contains the binary tree that is built as records are received. Entries in the tree, called nodes, are of fixed length and contain pointers to rows of data. They are used to track the order of the data. The second buffer area contains the rows of data in the sequential order they were received. The number of data rows that will fit in a given size of buffer is variable if the rows contain any variable length columns. The ideal is to have exactly enough node space to allow for one node for each row it takes to fill the data buffer area. Since it requires excessive overhead to get the space as it is needed, and there is no exact method to know the length of the rows that will be received, an approximation algorithm is required to determine the amount of node space to preallocate.

This approximation algorithm is as follows. When Sort/List services is called, it has information as to the number of 64K segments available for buffers, and column definitions of the data to be sorted. The critical unknown value is the average length of the variable part(s) of the data. The actual data length for variable fields can vary from less than one tenth the maximum length defined up to the maximum. For this algorithm an average of 40 percent is used to minimize the chance of running out of node space prior to use of all buffer space. Optimization in favor of node space is chosen since a row of data is generally much larger than the node related to it. This means an underestimation of node space will result in more space being wasted than a comparable underestimation of data space requirements.

The equation to calculate node and buffer space uses the...