Discovering Sociolinguistic Associations with Structured Sparsity

Eisenstein, Jacob; A. Smith, Noah; P. Xing, Eric

doi:10.1184/R1/6475556.v1

file.pdf (294.61 kB)

Discovering Sociolinguistic Associations with Structured Sparsity

journal contribution

posted on 2011-06-01, 00:00 authored by Jacob Eisenstein, Noah A. Smith, Eric P. Xing

We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors' geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. By imposing a composite ℓ_1,∞ regularizer, we obtain structured sparsity, driving entire rows of coefficients to zero. We perform two regression studies. First, we use term frequencies to predict demographic attributes; our method identifies a compact set of words that are strongly associated with author demographics. Next, we conjoin demographic attributes into features, which we use to predict term frequencies. The composite regularizer identifies a small number of features, which correspond to communities of authors united by shared demographic and linguistic properties

History

Publisher Statement

Date

2011-06-01

Usage metrics

Keywords

Machine Learning

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Discovering Sociolinguistic Associations with Structured Sparsity

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports