posted on 2012-05-01, 00:00authored byChris Dyer, Noah A. Smith, Graham Morehead, Phil Blunsom, Abby Levenberg
The core of our system is the hierarchical phrase-based translation model (Chiang, 2007), as implemented by the cdec decoder (Dyer et al., 2010).1 A 4-gram language model estimated using modi- fied Kneser-Ney smoothing was included (Chen and Goodman, 1999). Translation model features include the log relative frequency, log f(e | k), the log counts of k and e, k, the log “lexical translation” probabilities in both directions, indicator features for rule counts of 1. Translation model parameters were tuned using the dynamic programming variant of minimum error rate training for hypergraphs to maximize the bleu score on a held-out development set with a single reference translation (Kumar et al., 2009; Papineni et al., 2002).