Carnegie Mellon University
Browse

Improving Vector Space Word Representations Using Multilingual Correlation

Download (869.62 kB)
journal contribution
posted on 2014-04-01, 00:00 authored by Manaal Faruqui, Chris Dyer

The distributional hypothesis of Harris (1954), according to which the meaning of words is evidenced by the contexts they occur in, has motivated several effective techniques for obtaining vector space semantic representations of words using unannotated text corpora. This paper argues that lexico-semantic content should additionally be invariant across languages and proposes a simple technique based on canonical correlation analysis (CCA) for incorporating multilingual evidence into vectors generated monolingually. We evaluate the resulting word representations on standard lexical semantic evaluation tasks and show that our method produces substantially better semantic representations than monolingual techniques.

History

Publisher Statement

Copyright 2014 Association for Computational Linguistics

Date

2014-04-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC