P10-1047.pdf (1.36 MB)
Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish
journal contribution
posted on 2010-07-11, 00:00 authored by Reyyan Yeniterzi, Kemal OflazerKemal OflazerWe present a novel scheme to apply factored
phrase-based SMT to a language pair
with very disparate morphological structures.
Our approach relies on syntactic
analysis on the source side (English)
and then encodes a wide variety of local
and non-local syntactic structures as complex
structural tags which appear as additional
factors in the training data. On
the target side (Turkish), we only perform
morphological analysis and disambiguation
but treat the complete complex
morphological tag as a factor, instead of
separating morphemes. We incrementally
explore capturing various syntactic substructures
as complex tags on the English
side, and evaluate how our translations
improve in BLEU scores. Our maximal
set of source and target side transformations,
coupled with some additional
techniques, provide an 39% relative improvement
from a baseline 17.08 to 23.78
BLEU, all averaged over 10 training and
test sets. Now that the syntactic analysis
on the English side is available, we
also experiment with more long distance
constituent reordering to bring the English
constituent order close to Turkish, but find
that these transformations do not provide
any additional consistent tangible gains
when averaged over the 10 sets.