Carnegie Mellon University
Browse

Discovering Sociolinguistic Associations with Structured Sparsity

Download (294.61 kB)
journal contribution
posted on 2011-06-01, 00:00 authored by Jacob Eisenstein, Noah A. Smith, Eric P. Xing

We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors' geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. By imposing a composite ℓ1,∞ regularizer, we obtain structured sparsity, driving entire rows of coefficients to zero. We perform two regression studies. First, we use term frequencies to predict demographic attributes; our method identifies a compact set of words that are strongly associated with author demographics. Next, we conjoin demographic attributes into features, which we use to predict term frequencies. The composite regularizer identifies a small number of features, which correspond to communities of authors united by shared demographic and linguistic properties

History

Publisher Statement

Copyright 2011 Association for Computational Linguistics

Date

2011-06-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC