posted on 2014-10-25, 00:00authored bySerena Jeblee, Weston Freely, Houda Bouamor, Alon Lavie, Nizar Habash, Kemal OflazerKemal Oflazer
In this paper, we present a statistical machine
translation system for English to Dialectal
Arabic (DA), using Modern Standard
Arabic (MSA) as a pivot. We create
a core system to translate from English
to MSA using a large bilingual parallel
corpus. Then, we design two separate
pathways for translation from MSA into
DA: a two-step domain and dialect adaptation
system and a one-step simultaneous
domain and dialect adaptation system.
Both variants of the adaptation systems are
trained on a 100k sentence tri-parallel corpus
of English, MSA, and Egyptian Arabic
generated by a rule-based transformation.
We test our systems on a held-out Egyptian
Arabic test set from the 100k sentence
corpus and we achieve our best performance
using the two-step domain and
dialect adaptation system with a BLEU
score of 42.9.
History
Publisher Statement
Published in Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), pages 196–206, October 25, 2014, Doha, Qatar.