Browse Prior Art Database

TRANSFER LEARNING WITH ONE-CLASS DATA

IP.com Disclosure Number: IPCOM000218910D
Publication Date: 2012-Jun-11

Publishing Venue

The IP.com Prior Art Database

Abstract

When training and testing data are drawn from different domains or distributions, most statistical models need to be rebuilt using the newly collected testing data. Transfer learning is a family of algorithms that improve the learning of a new target domain by transferring knowledge from a different training domain. In this paper, we consider a new setting of transfer learning, where only a few negative data are available in the target domain. We introduce a regression-based negative data transfer learning algorithm to address this problem. Accordingly, in contrast to the traditional discriminative feature selection which seeks the best classification performance in the training data, we propose a new framework to learn the most transferrable discriminative features that are suitable for transfer learning. The method demonstrates improved performance of transfer learning in the context of facial expression recognition.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 13% of the total text.

TRANSFER LEARNING WITH ONE-CLASS DATA

1.         BACKGROUND

This publication is related to the field of machine learning. Transfer learning aims to extract the knowledge from one or more source domains and improve the learning in the target domain. It has been applied to a wide variety of applications, such as object recognition [20], sign language recognition [7] and text classification [17]. Let’s denote the source domain data as DS = {(xS,1, yS,1) …, (xS,NS, yS,NS)} and the target domain data as DT = {(xT,1, yT,1), …, (xT,NT, yT,NT)}, where x X is in the feature space and y {1,+1} is the binary label. The goal is to learn the target classifier fT: xT → yT for the target data DT. The current transfer learning algorithms can be categorized under three settings [13]: inductive transfer learning, transductive transfer learning [6] and unsupervised transfer learning. In inductive transfer learning [4, 20, 18, 11], both the source data DS and the target data DT are available. In transductive transfer learning the source data DS and target data DT are available, but the target data label yT is not available. Finally, the unsupervised transfer learning, such as [5], is applied to unsupervised learning task, such as clustering and dimensionality reduction, when both the target label and the source label are not available.

In this paper, we consider a new setting of transfer learning, where only the negative data  in the target domain are available. This scenario is different from the inductive transfer learning where we can learn the target classifier from positive and negative target data. It’s also different from the transductive transfer learning where we can re-weight the negative and positive data together based on the marginal distribution. In the new setting, the target data is extremely unbalanced (no positive data), thus none of the above transfer learning algorithms can be applied. However, this setting is not uncommon in applications. For instance, in the object detection, the background data of a new scene are easy to collect. Can we update the object/background classifier for this new scene using only the background data? For the facial expression recognition, the neutral face of a new subject is relatively easy to capture. Can we predict his/her smile expression only based on his/her neutral face? To our knowledge, this special negative data transfer learning problem has never been addressed in the literature.

2.         AdaBoost Classifier with Discriminative Features

In this section, we introduce the learning of an AdaBoost classifier as shown in Algorithm 1 below from the source training data DS. Here, the source data are labeled as negative and positive data . The positive/negative distribution is modeled as Gaussian distribution: p+ = (µ+,s+), p- = (µ-,s -). Notice that each weak classifier is learned from the positive/negative distribution of one feature. Thus, the selected weak classifiers a...