Browse Prior Art Database

TECHNIQUE FOR CLASSIFICATION OF SPARSELY LABELED DATA

IP.com Disclosure Number: IPCOM000238490D
Publication Date: 2014-Aug-28
Document File: 6 page(s) / 68K

Publishing Venue

The IP.com Prior Art Database

Abstract

The invention proposes a technique for classification of sparsely labeled data. The technique includes a multi instance-learning algorithm (MIL) to classify a group of instances as opposed to classification at instance level. A group of instances is labeled as a bag. Classification at the bag level is applied in medical imaging, where an expert radiologist mark an image as pathological rather than marking each infected region. The MIL algorithm is utilized when there is an issue with respect to classifying bunch of related events. For instance, predicting or not a sequence of gas turbine startups results to a trip start. For classification at the bag level, discriminative features are extracted for different bag classes. Sparsely labeled data induces challenge for extracting discriminative feature at the bag level. Feature at the instance level is extracted; the bag level labeling is accurate as each bag is set instances. The bag is labeled positive when one instance in the bag is positive and negative otherwise. In the bag there are more than one multi dimensional feature vector per bag. To overcome, instances in an intermediate feature space with a single feature vector per bag are build. To classify bags, new feature are utilized at the bag level and are classified utilizing an ensemble of classifiers.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 39% of the total text.

TECHNIQUE FOR CLASSIFICATION OF SPARSELY LABELED DATA

BRIEF ABSTRACT

            The invention proposes a technique for classification of sparsely labeled data. The technique includes a multi instance-learning algorithm (MIL) to classify a group of instances as opposed to classification at instance level. A group of instances is labeled as a bag.  Classification at the bag level is applied in medical imaging, where an expert radiologist mark an image as pathological rather than marking each infected region. The MIL algorithm is utilized when there is an issue with respect to classifying bunch of related events. For instance, predicting or not a sequence of gas turbine startups results to a trip start. For classification at the bag level, discriminative features are extracted for different bag classes. Sparsely labeled data induces challenge for extracting discriminative feature at the bag level.  Feature at the instance level is extracted; the bag level labeling is accurate as each bag is set instances. The bag is labeled positive when one instance in the bag is positive and negative otherwise.In the bag there are more than one multi dimensional feature vector per bag. To overcome, instances in an intermediate feature space with a single feature vector per bag are build. To classify bags, new feature are utilized at the bag level and are classified utilizing an ensemble of classifiers.

KEYWORDS

Labeled data, clustering algorithm, bag, Extract, Model


DETAILED DESCRIPTION

            Generally, in some cases an instant-level label is available which is not always required due to certain difficulties. One of the difficulties includes acquiring precisely labeled instances, which are time-consuming and suffer from operator variability, whereas the precise labeling at lower level of granularity is easily acquired for large sample of instances. This is applicable in computer aided detection issues in a medical imaging where an expert labels an image with one or more abnormalities indicating certain disease. The specific region includes abnormalities, which is either diffuse character or are not specifically marked out.

            Another difficulty includes classifying the bag or sequence of instance rather than one of classifying a single instance. For a gas turbine advisor, to predict whether or not a sequence of gas turbine startups results to a trip start. Although the historical data has precise labeling as to which startup has trip and which one does not. Issue is one of classifying a sequence or bag of startups. The data on which a model is based or deployed includes sequence of consecutive startups for a particular unit results to miss data for startups leading to the trip.

            A conventional technique includes binary feature to classify the labeled data. However, the conventional technique includes does not allow to cluster and classify the data.

            Therefore, there...