Browse Prior Art Database

Hetero-Labeled Latent Dirichlet Allocation (hLLDA) with Heterogeneous Labels

IP.com Disclosure Number: IPCOM000238828D
Publication Date: 2014-Sep-19
Document File: 3 page(s) / 136K

Publishing Venue

The IP.com Prior Art Database

Abstract

A partially supervised topic model, hetero Labeled Latent Dirichlet Allocation (hLLDA), is disclosed with heterogeneous labels.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 3

with Heterogeneous Labels

with Heterogeneous Labels

In existing semi supervised learning methods, there are two major limitations related to partial labels. The first limitation is that the partial labels incorporate only one type of domain knowledge, such as, but not limited to, document labels or feature labels, and second limitation is that provided labels cover all classes in a problem space . The limitations limit applicability in real life situations where domain knowledge for labeling comes in different forms from different groups of domain experts where some classes may not have labels.

Disclosed is a partially supervised topic model, hetero Labeled Latent Dirichlet

Allocation (hLLDA), with heterogeneous labels. The topic model learns from multiple types of labels such as document labels and feature labels . The topic model accommodates labels for only a subset of classes , such as, partial labels thereby addressing the two major limitations. In addition, the topic model resolves both label heterogeneity and label partialness problems in a unified generative process .

The topic model learns more semantically coherent topics in a document collection by incorporating different types of domain knowledge such as document labels and word labels. In many real world situations, domain knowledge is often provided as document labels or feature labels. The model utilizes an algorithm, as shown in fig.1, that supports real word applications where three types of side information (domain knowledge) exist that can be easily acquired.

Hetero- -Labeled Latent Dirichlet Allocation

Labeled Latent Dirichlet Allocation (((hLLDA

hLLDA) )

1


Page 02 of 3

Figure 1

hLLDA provides a unified framework that discovers topics from data that is partially labeled with heterogeneous labels. The hLLDA model provides heterogeneous supervision, wherein it is assumed that multiple types of supervision are provided in training data. For instance, a subset of the training data is assigned with document labels, and a subset of the topics are associated wit...