Diverse Context for Learning Word Representations

Faruqui, Manaal

doi:10.1184/R1/28685066.v1

Diverse Context for Learning Word Representations

thesis

posted on 2025-04-10, 20:21 authored by Manaal Faruqui

Word representations are mathematical objects that capture a word’s meaning and its grammatical properties in a way that can be read and understood by computers. Word representations map words into equivalence classes such that words that share similar properties to each other are part of the same equivalence class. Word representations are either constructed man ually by humans (in the form of word lexicons, dictionaries etc.) or obtained automatically using unsupervised learning algorithms. Since, manual construction of word representations is unscalable, and expensive, obtaining them automatically is desirable.

Traditionally, automatic learning of word representations has relied on the distributional hypothesis, which states that the meaning of a word is evidenced by the words that occur in its context (Harris, 1954). Thus, existing word representation learning algorithms like latent semantic analysis (Deerwester et al., 1990; Landauer and Dumais, 1997), derive word meaning in terms of aggregated co-occurrence counts of words extracted from unlabeled monolingual corpora.

In this thesis, we diversify the notion of context to include information beyond the mono lingual distributional context. We show that information about word meaning is present in other contexts like neighboring words in a semantic lexicon, context of the word across different languages, and the morphological structure of the word. We show that in addition to monolingual distributional context these sources provide complementary information about word meaning, which can substantially improve the quality of word representations. We present methods to augment existing models of word representations to incorporate these knowledge sources.

History

Date

2016-05-03

Degree Type

Dissertation

Department

Language Technologies Institute

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Chris Dyer Noah A. Smith Eduard Hovy

Usage metrics

Keywords

distributional hypothesis word representations semantic lexicon

Licence

In Copyright

Diverse Context for Learning Word Representations

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports