Browse Prior Art Database

Feature Selection by Encoding Length

IP.com Disclosure Number: IPCOM000037123D
Original Publication Date: 1989-Nov-01
Included in the Prior Art Database: 2005-Jan-29
Document File: 3 page(s) / 39K

Publishing Venue

IBM

Related People

Dom, BE: AUTHOR [+3]

Abstract

This invention is a technique for selecting the optimum feature subset for pattern recognition using information measures based on the codelength required to encode the class assignments of a set of objects using certain encoding schemes based on probability models.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

Feature Selection by Encoding Length

This invention is a technique for selecting the optimum feature subset for pattern recognition using information measures based on the codelength required to encode the class assignments of a set of objects using certain encoding schemes based on probability models.

In the following discussion, the term "complexity" refers to an idealized code length, which differs from a real code length in that it need not be integer valued. Also r is a parameter vector that specifies the probability model to be used for a given feature subset, specified by a subset index k.

A set of feature measurements is given that may be applied to a set of objects for the purpose of classifying those objects.

Applying these measurements to a given object produces a feature vector, x, whose components are feature measurements for that object. It is desired to construct a classifier with optimum performance based on these features and a model class described by the probability densities: P(x r), P(r k) and P(k). The densities P(r
k) and P(k) will be treated as priors, though in fact they may represent real prior knowledge or may be just computational devices. To accomplish this task, a training sample {X, C} is provided, where X is a set of feature vectors corresponding to the objects in the sample and C is the corresponding set of class assignments. All the methods described here have the following structure: Initialize:

Imin = very large number

For all feature subsets, k

compute the complexity (code length) of the class

assignments given the feature vectors: I(C X)

If I < Imin then Imin = I and kmin = k

end

At the completion of this procedure, kmin defines the optimum feature subset. The various methods claimed correspond to different forms of this complexity. General Method 1: For General Method 1 the complexity to be minimized will be associated with the probability P(C,r,kX), and is written as: I1(C X,k) = -log P(C,r(k),k X) = -log { P(X C, r) P(C) " P(rk) P(k) } P(X r) (1) Where r(k) is the r value that minimizes the complexity -log P(C,r,k X) for a given k.

Specific Method 1: This method uses general method 1 with a specific form for P(r k). For P(k) a uniform distribution is used. The form used for P(r k) is a "universal" prior derived by [1] based on coding theoretic considerations. For the case of N vectors in the training sample and k parameters, this...