Browse Prior Art Database

Managing Topic Model Transition Via Constraints

IP.com Disclosure Number: IPCOM000236095D
Publication Date: 2014-Apr-04
Document File: 3 page(s) / 61K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a novel constrained topic modeling framework to enable the smooth transition of topic models.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 3

Managing Topic Model Transition Via Constraints

Due to its capability in uncovering hidden semantics/themes in unstructured data, Latent Dirichlet Allocation (LDA) is the most common topic-modeling tool currently in use. LDA is applied to many applications including text analysis (e.g., topic analysis for emails, news articles, instant messages, and scientific papers), image processing, social network analysis, and bio-informatics.

Using LDA in practice, however, there is no method for updating such a system , especially when it is deployed and is currently in use. Updating a topic model requires not only refreshing the topic model with new content, but also ensuring that the transition is smooth, minimizing the disruption to the end users .

In an example of the problem, User A has been using an LDA model trained on last year's Database and Logic Programming (DBLP) publications to organize favorite papers into several topics (e.g., Natural Language Processing (NLP), Speech Processing, Information Management, and Web Technology). A year later, User A

wants to refresh the topic model so that it includes this year 's publications as well. If a classic LDA method is used, it is possible that, after the update, the topics in the old topic model will have changed or even disappeared in the new topic model . This can be very confusing to User A, since some papers previously categorized as "NLP" are now under "Information Management" or worse, under an incoherent topic in the new topic model.

Existing solutions for updating a topic model frequently employ an online LDA algorithm; however, this solution cannot be applied to existing systems that do not use online LDA.

The solution presented herein is a novel constrained topic -modeling framework to enable the smooth transition of topic models . The proposed algorithm keeps users informed and allows users to guide the topic model transition . Referring to the above example problem: since User A only cares about a few topics, such as NLP, Speech Processing, and Information Management, it is important that these topics not be disrupted during the transition. It is however, acceptable that other topics, especially those incoherent topics, are disrupted. The new solution is able to consider user input, unlike the typical online LDA algorithm.

The novel approach for managing topic model transition is independent of the topic-modeling algorithm used in...