Carnegie Mellon University
Browse

Making the Most of Bag of Words: Sentence Regularization with Alternating Direction Method of Multipliers

Download (321.29 kB)
journal contribution
posted on 2014-06-01, 00:00 authored by Dani Yogatama, Noah A. Smith

In many high-dimensional learning problems, only some parts of an observation are important to the prediction task; for example, the cues to correctly categorizing a document may lie in a handful of its sentences. We introduce a learning algorithm that exploits this intuition by encoding it in a regularizer. Specifically, we apply the sparse overlapping group lasso with one group for every bundle of features occurring together in a training-data sentence, leading to thousands to millions of overlapping groups. We show how to efficiently solve the resulting optimization challenge using the alternating directions method of multipliers. We find that the resulting method significantly outperforms competitive baselines (standard ridge, lasso, and elastic net regularizers) on a suite of real-world text categorization problems.

History

Publisher Statement

Copyright 2014 by the author(s).

Date

2014-06-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC