Browse Prior Art Database

# Algorithm for the Most Informative Binary Question in the Two-Class Problem

IP.com Disclosure Number: IPCOM000101022D
Original Publication Date: 1990-Jun-01
Included in the Prior Art Database: 2005-Mar-16
Document File: 2 page(s) / 60K

IBM

## Related People

Kanevsky, D: AUTHOR [+3]

## Abstract

This article describes an algorithm for finding the most or near most informative 2-way questions (or equivalently -2- split of a finite measurement space) in a two-class problem. The BFOS (1) algorithm requires consideration of m subsets where m is the number of elements in the measurement space. This method requires considering 0(log m) possibilities. The notation and statement of the problem follows: (Image Omitted) Let X = {1,2,...,m}, Y = {1,2}. Let (X,Y) be a random variable, X e X, Y e Y and let Pi = Prob(Y = 1¯X = i). Let S = {S1,S2} be partition of X, i.e., S1 S2 = X, S1 S2 is empty. Finally, let E(S) = - Prob(Y = j, X Si)log Prob(Y = j¯X Si). The problem is to find the S that minimizes the expression E(S).

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Algorithm for the Most Informative Binary Question in the Two-Class Problem

or near most informative 2-way questions (or equivalently -2- split
of a finite measurement space) in a two-class problem. The BFOS (1)
algorithm requires consideration of m subsets where m is the number
of elements in the measurement space. This method requires
considering 0(log m) possibilities.
The notation and statement of the problem follows:

(Image Omitted)

Let X = {1,2,...,m}, Y = {1,2}.
Let (X,Y) be a random variable, X e X, Y e Y and let Pi
= Prob(Y = 1¯X = i).
Let S = {S1,S2} be partition of X, i.e., S1   S2 = X, S1 S2
is empty. Finally, let E(S) = -        Prob(Y = j, X  Si)log Prob(Y =
j¯X  Si).
The problem is to find the S that minimizes the expression E(S).

The BFOS algorithm provides a way for finding two best
informative questions in a two-class problem that requires
consideration of m subsets.  In applications this algorithm is used
iteratively to find two suboptimal questions in general
classification problem and in construction of a search tree [2].  So
for the values of m of practical interest in Speech Recognition this
algorithm is too slow. It was observed that the values of informative
splits E(S) for subsets in the BFOS algorithm form an almost concave