W14-3627.pdf (257.39 kB)

Domain and Dialect Adaptation for Machine Translation into Egyptian Arabic

Download (257.39 kB)
journal contribution
posted on 25.10.2014 by Serena Jeblee, Weston Freely, Houda Bouamor, Alon Lavie, Nizar Habash, Kemal Oflazer
In this paper, we present a statistical machine translation system for English to Dialectal Arabic (DA), using Modern Standard Arabic (MSA) as a pivot. We create a core system to translate from English to MSA using a large bilingual parallel corpus. Then, we design two separate pathways for translation from MSA into DA: a two-step domain and dialect adaptation system and a one-step simultaneous domain and dialect adaptation system. Both variants of the adaptation systems are trained on a 100k sentence tri-parallel corpus of English, MSA, and Egyptian Arabic generated by a rule-based transformation. We test our systems on a held-out Egyptian Arabic test set from the 100k sentence corpus and we achieve our best performance using the two-step domain and dialect adaptation system with a BLEU score of 42.9.


Publisher Statement

Published in Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), pages 196–206, October 25, 2014, Doha, Qatar.