Browse Prior Art Database

Algorithm for the Most Informative Binary Question in the Two-Class Problem

IP.com Disclosure Number: IPCOM000101022D
Original Publication Date: 1990-Jun-01
Included in the Prior Art Database: 2005-Mar-16
Document File: 2 page(s) / 60K

Publishing Venue

IBM

Related People

Kanevsky, D: AUTHOR [+3]

Abstract

This article describes an algorithm for finding the most or near most informative 2-way questions (or equivalently -2- split of a finite measurement space) in a two-class problem. The BFOS (1) algorithm requires consideration of m subsets where m is the number of elements in the measurement space. This method requires considering 0(log m) possibilities. The notation and statement of the problem follows: (Image Omitted) Let X = {1,2,...,m}, Y = {1,2}. Let (X,Y) be a random variable, X e X, Y e Y and let Pi = Prob(Y = 1¯X = i). Let S = {S1,S2} be partition of X, i.e., S1 S2 = X, S1 S2 is empty. Finally, let E(S) = - Prob(Y = j, X Si)log Prob(Y = j¯X Si). The problem is to find the S that minimizes the expression E(S).

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Algorithm for the Most Informative Binary Question in the Two-Class Problem

       This article describes an algorithm for finding the most
or near most informative 2-way questions (or equivalently -2- split
of a finite measurement space) in a two-class problem. The BFOS (1)
algorithm requires consideration of m subsets where m is the number
of elements in the measurement space. This method requires
considering 0(log m) possibilities.
The notation and statement of the problem follows:

                            (Image Omitted)

 Let X = {1,2,...,m}, Y = {1,2}.
Let (X,Y) be a random variable, X e X, Y e Y and let Pi
= Prob(Y = 1¯X = i).
Let S = {S1,S2} be partition of X, i.e., S1   S2 = X, S1 S2
is empty. Finally, let E(S) = -        Prob(Y = j, X  Si)log Prob(Y =
j¯X  Si).
The problem is to find the S that minimizes the expression E(S).

      The BFOS algorithm provides a way for finding two best
informative questions in a two-class problem that requires
consideration of m subsets.  In applications this algorithm is used
iteratively to find two suboptimal questions in general
classification problem and in construction of a search tree [2].  So
for the values of m of practical interest in Speech Recognition this
algorithm is too slow. It was observed that the values of informative
splits E(S) for subsets in the BFOS algorithm form an almost concave
plot.  In this article this fact is used for constructing the
algorithm that finds the most...