Browse Prior Art Database

System for Scaling Multitask Learning (MLT) by Expressing MLT as a Factorization Problem

IP.com Disclosure Number: IPCOM000200154D
Publication Date: 2010-Oct-01
Document File: 5 page(s) / 490K

Publishing Venue

The IP.com Prior Art Database

Related People

Deepayan Chakrabarti: INVENTOR [+4]

Abstract

A system for scaling Multitask Learning (MLT) by expressing MLT as a factorization problem is provided. The expression of MLT as a factorization problem allows applying Stochastic Gradient Descent (SGD), for linearly scaling MTL with respect to number of input examples.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 41% of the total text.

System for Scaling Multitask Learning (MLT) by Expressing MLT as a Factorization Problem

Abstract

A system for scaling Multitask Learning (MTL) by expressing MLT as a factorization problem is provided.  The expression of MLT as a factorization problem allows applying Stochastic Gradient Descent (SGD), for linearly scaling MTL with respect to number of input examples.

Description

Disclosed is a system for scaling Multitask Learning (MTL) by expressing MLT as a factorization model.

The system establishes a link between MTL problem and Collaborative Filtering (CF) problem.  CF between users and items may be expressed by denoting the user index "i" and set x as "itemID", thus forming a matrix "M" of connections between individual users and items. Then an analog of R[f] for CF may be defined as:

(1)

Using a subset of ratings Z: = {… ((i,j), Yij) …} consisting of (user, movie) pairs and their associated scores the analogous empirical risk functional is obtained as follows:

 (2)

Thus, the CF problem solves the following objective (as shown below) with Rreg [f;Z] replaced by Rreg[M;Z]:

(3)

The CF problem may use the matrix "M" directly.  However, for MTL to use the matrix "M", task "i" and functions "fi" are assumed to be elements of a linear function space.  More specifically, we assume that they can be written as:

(4)

Further it is assumed that there exists "d" functions gi (x):= <vi, Ø(x)> which constitutes the basis of the estimation problems.  Based on this analogy the following variant of R reg [f;Z] is obtained via regularized risk minimization:

(5)

Therefore, the CF problem and the MTL problem may be expressed as the same optimization problem under different definitions of the function "f", shown in equation 3.

For CF, the optimization problem is expressed as the user-item matrix "M", while for MTL the optimization problem is expressed as a weighting matrix "W" that relates the tasks to the feature vector of data.  Thus, the CF problem and MTL problem may be written in terms of the same objective function.  

Further, the connection between CF problem and MTL problem may be shown to be equivalent using distinct regularizers.  Consider matrix "M" with singular values σi.  A common regularizer for matrix "M" is its trace norm (or, the Ky-Fan norm): ||M||KF := Σii| The trace norm may be obtained as a following lemma:

(6)

The squared Ky-Fan norm ||M||2KF is the solution of the optimization problem.  The squared Ky-Fan norm is obtained as a following lemma:

(7)

Assuming that , Lagrange function associated with the squared Ky-Fan norm equation is given by:

(8)

The condition  turns out to be automatically satisfied due to the form of the objective function. At optimality we have:

(9)

Equivalently, as "K" has a full rank, "Z" may be expressed as follows:

Z = λK2 (10)

The lemmas shown in equations 6 and 7, establishes that both factorization models and multitask regularization models lead to the same objective.  The main differe...