Distributed Representations of Geographically Situated Language

Bamman, David; Dyer, Chris; A. Smith, Noah

doi:10.1184/R1/6473324.v1

Distributed Representations of Geographically Situated Language

journal contribution

posted on 2014-06-01, 00:00 authored by David Bamman, Chris Dyer, Noah A. Smith

We introduce a model for incorporating contextual information (such as geography) in learning vector-space representations of situated language. In contrast to approaches to multimodal representation learning that have used properties of the object being described (such as its color), our model includes information about the subject (i.e., the speaker), allowing us to learn the contours of a word’s meaning that are shaped by the context in which it is uttered. In a quantitative evaluation on the task of judging geographically informed semantic similarity between representations learned from 1.1 billion words of geo-located tweets, our joint model outperforms comparable independent models that learn meaning in isolation.