Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Speeding the Application of Bit-Packed Decision Trees

IP.com Disclosure Number: IPCOM000104957D
Original Publication Date: 1993-Jun-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 2 page(s) / 67K

Publishing Venue

IBM

Related People

Epstein, M: AUTHOR

Abstract

In the Tangora[1], a best-first search algorithm is used to generate a phonetic baseform for a word given its spelling and one utterance. This uses a bit-packed decision tree to apply the spelling-to-sound rules[2]. This invention discloses a way to implement the search so as to avoid repeated bit operations.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Speeding the Application of Bit-Packed Decision Trees

      In  the Tangora[1], a best-first search algorithm is used to
generate a phonetic baseform for a word given  its  spelling and  one
utterance.  This uses a bit-packed decision tree to apply the
spelling-to-sound rules[2].  This invention discloses a way to
implement the search so as to avoid repeated bit operations.

      The decision trees in the Tangora addword procedure have the
following format:
true_offset   false_offset  question    answers

4 bytes       4 bytes      1 byte    5-17 bytes

      The idea behind this representation is that the question refers
to a position in the spelling of the word, or the phones generated so
far.  The value at that position is used to index into the bit
string.  If the bit is 0, then the question at the false offset is
used next.   If the bit is 1, then the question at the true offset is
used next.  If the offset is negative, then this indicates
termination, and the  negative offset indexes into a probability
distribution for how the lett should be pronounced.

      This leads to the following, inefficient algorithm:

1.  Set offset = the starting question
2.  array[1..5]  = 5 previous letters
3.  array[6..10]  = 5 following letters
4.  array[11..15]  = 5 previous phones
5.  While offset >= 0

        q = tree[offset].question
        if (BITSET(tree[offset].answers,array[q]))
               offset = tree[offset].true_offset
        else offset = tree[offset].false_offset

6.  return(-offset)

      While  this code is very compact, this was found to use about
12.5%  of the total algorithm,  mainly because the decision trees are
very deep.  The inefficiency comes from the repeated call to the
BITSET macro, which must determine which word to use, and the bit
mask to be used in the logical AND.

      For de...