Carnegie Mellon University
Browse
file.pdf (515.97 kB)

Generative Models of Monolingual and Bilingual Gappy Patterns

Download (515.97 kB)
journal contribution
posted on 2011-07-01, 00:00 authored by Kevin Gimpel, Noah A. Smith

A growing body of machine translation research aims to exploit lexical patterns (e.g., ngrams and phrase pairs) with gaps (Simard et al., 2005; Chiang, 2005; Xiong et al., 2011). Typically, these “gappy patterns” are discovered using heuristics based on word alignments or local statistics such as mutual information. In this paper, we develop generative models of monolingual and parallel text that build sentences using gappy patterns of arbitrary length and with arbitrarily many gaps. We exploit Bayesian nonparametrics and collapsed Gibbs sampling to discover salient patterns in a corpus. We evaluate the patterns qualitatively and also add them as features to an MT system, reporting promising preliminary results.

History

Publisher Statement

Copyright 2011 ACL

Date

2011-07-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC