Browse Prior Art Database

# Speeding the Application of Bit-Packed Decision Trees

IP.com Disclosure Number: IPCOM000104957D
Original Publication Date: 1993-Jun-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 2 page(s) / 67K

IBM

## Related People

Epstein, M: AUTHOR

## Abstract

In the Tangora[1], a best-first search algorithm is used to generate a phonetic baseform for a word given its spelling and one utterance. This uses a bit-packed decision tree to apply the spelling-to-sound rules[2]. This invention discloses a way to implement the search so as to avoid repeated bit operations.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Speeding the Application of Bit-Packed Decision Trees

In  the Tangora[1], a best-first search algorithm is used to
generate a phonetic baseform for a word given  its  spelling and  one
utterance.  This uses a bit-packed decision tree to apply the
spelling-to-sound rules[2].  This invention discloses a way to
implement the search so as to avoid repeated bit operations.

The decision trees in the Tangora addword procedure have the
following format:

4 bytes       4 bytes      1 byte    5-17 bytes

The idea behind this representation is that the question refers
to a position in the spelling of the word, or the phones generated so
far.  The value at that position is used to index into the bit
string.  If the bit is 0, then the question at the false offset is
used next.   If the bit is 1, then the question at the true offset is
used next.  If the offset is negative, then this indicates
termination, and the  negative offset indexes into a probability
distribution for how the lett should be pronounced.

This leads to the following, inefficient algorithm:

1.  Set offset = the starting question
2.  array[1..5]  = 5 previous letters
3.  array[6..10]  = 5 following letters
4.  array[11..15]  = 5 previous phones
5.  While offset >= 0

q = tree[offset].question