MedLDA: Maximum Margin Supervised Topic Models

Zhu, Jun; Ahmed, Amr; P Xing, Eric

doi:10.1184/R1/6475874.v1

file.pdf (1.26 MB)

MedLDA: Maximum Margin Supervised Topic Models

journal contribution

posted on 2012-08-01, 00:00 authored by Jun Zhu, Amr Ahmed, Eric P Xing

A supervised topic model can use side information such as ratings or labels associated with documents or images to discover more predictive low dimensional topical representations of the data. However, existing supervised topic models predominantly employ likelihood-driven objective functions for learning and inference, leaving the popular and potentially powerful max-margin principle unexploited for seeking predictive representations of data and more discriminative topic bases for the corpus. In this paper, we propose the maximum entropy discrimination latent Dirichlet allocation (MedLDA) model, which integrates the mechanism behind the max-margin prediction models (e.g., SVMs) with the mechanism behind the hierarchical Bayesian topic models (e.g., LDA) under a unified constrained optimization framework, and yields latent topical representations that are more discriminative and more suitable for prediction tasks such as document classification or regression. The principle underlying the MedLDA formalism is quite general and can be applied for jointly max-margin and maximum likelihood learning of directed or undirected topic models when supervising side information is available. Efficient variational methods for posterior inference and parameter estimation are derived and extensive empirical studies on several real data sets are also provided. Our experimental results demonstrate qualitatively and quantitatively that MedLDA could: 1) discover sparse and highly discriminative topical representations; 2) achieve state of the art prediction performance; and 3) be more efficient than existing supervised topic models, especially for classification.

History

Publisher Statement

Date

2012-08-01

Usage metrics

Keywords

supervised topic models max-margin learning maximum entropy discrimination latent Dirichlet allocation support vector machines

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

MedLDA: Maximum Margin Supervised Topic Models

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports