10.1184/R1/6368090.v1
Reyyan Yeniterzi
Reyyan
Yeniterzi
Kemal Oflazer
Kemal
Oflazer
Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish
Carnegie Mellon University
2010
Statistical Machine Translation
Morphology
Turkish
2010-07-11 00:00:00
Journal contribution
https://kilthub.cmu.edu/articles/journal_contribution/Syntax-to-Morphology_Mapping_in_Factored_Phrase-Based_Statistical_Machine_Translation_from_English_to_Turkish/6368090
We present a novel scheme to apply factored
phrase-based SMT to a language pair
with very disparate morphological structures.
Our approach relies on syntactic
analysis on the source side (English)
and then encodes a wide variety of local
and non-local syntactic structures as complex
structural tags which appear as additional
factors in the training data. On
the target side (Turkish), we only perform
morphological analysis and disambiguation
but treat the complete complex
morphological tag as a factor, instead of
separating morphemes. We incrementally
explore capturing various syntactic substructures
as complex tags on the English
side, and evaluate how our translations
improve in BLEU scores. Our maximal
set of source and target side transformations,
coupled with some additional
techniques, provide an 39% relative improvement
from a baseline 17.08 to 23.78
BLEU, all averaged over 10 training and
test sets. Now that the syntactic analysis
on the English side is available, we
also experiment with more long distance
constituent reordering to bring the English
constituent order close to Turkish, but find
that these transformations do not provide
any additional consistent tangible gains
when averaged over the 10 sets.