Browse Prior Art Database

Semi-Supervised Inductive Classification

IP.com Disclosure Number: IPCOM000125653D
Original Publication Date: 2005-Jul-10
Included in the Prior Art Database: 2005-Jul-10
Document File: 4 page(s) / 128K

Publishing Venue

Siemens

Related People

Juergen Carstens: CONTACT

Abstract

Semi-supervised learning differs from the traditional supervised learning by additionally exploring the information of the unlabelled examples. In many applications, like text categorization, collecting labeled examples costs human efforts, while vast amounts of unlabelled data are often readily available and offer some additional information. This is the situation, in which semi-supervised learning becomes very useful. In the paradigm, the function of interest is regularized to be a priori consistent with the inherent structure of input density p(x). Several advances were recently achieved, like Markov random walks, cluster kernels, Gaussian random fields, and regularization on graphs. However, a disadvantage of many existing methods is that it does not generalize to unseen inputs. Another problem of semi-supervised transduction is the computational complexity. Since a matrix needs either to be inverted or diagonalized, semi-supervised transduction scaling as O(n3). As potentially a vast amount of unlabelled points are involved, the computational cost becomes prohibitive.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 30% of the total text.

Page 1 of 4

S

Semi-Supervised Inductive Classification

Idea: Kai Yu, DE-Muenchen; Volker Tresp, DE-Muenchen

Semi-supervised learning differs from the traditional supervised learning by additionally exploring the information of the unlabelled examples. In many applications, like text categorization, collecting labeled examples costs human efforts, while vast amounts of unlabelled data are often readily available and offer some additional information. This is the situation, in which semi-supervised learning becomes very useful. In the paradigm, the function of interest is regularized to be a priori consistent with the inherent structure of input density p(x). Several advances were recently achieved, like Markov random walks, cluster kernels, Gaussian random fields, and regularization on graphs. However, a disadvantage of many existing methods is that it does not generalize to unseen inputs. Another problem of semi-supervised transduction is the computational complexity. Since a matrix needs either to be inverted or diagonalized, semi-supervised transduction scaling as O(n3). As potentially a vast amount of unlabelled points are involved, the computational cost becomes prohibitive.

  n n ×

In the following, learning methods are proposed that effectively make use of both labeled and unlabelled data to build predictive functions, which are defined on not just the seen inputs but the whole space. As a nice property, the proposed method allows efficient training and can easily handle new test points. We validate the method based on both toy data and real world data sets.

We introduce a semi-supervised inductive algorithm. The idea is to use basis function expansion to form a regularizer induced by the normalized graph Laplacian. The limit of this regularizer is connected to the expected squared gradient of the function weighted by p(x), giving rise to a natural density- dependent smoothness penalty. The connection clarifies which representation of functions is suitable for a good approximation.

Suppose that, given n inputs i.i.d. sampled from a density p(x), one observes responses

of an underlying function f(x) on the first

  n i i 1

  } { =

x

  n i i

y 1

  } { = n l ≤ inputs (without loss of generality) plus some

stationary additive noises. The goal is to estimate the underlying function f. Supervised induction

ignores the existence of unlabelled data , and seeks for a f that minimizes the cost

n

   l i i 1

  } { +

=

x

(1)

where

2

f is the norm defined in a Hilbert space H. The first term of Q(f) enforces f to be close to

observations. The second term, called the regularizer, ensures the smoothness of f. A reasonable assumption behind the regularizer is that close inputs should have similar function values (as in ridge regression). The notion of closeness between two inputs usually does not regard their context of input density: for example, the two points might be separated by a low-density region.

Semi-supervised learning employs a different assumption, in wh...