W14-3627.pdf (257.39 kB)

Domain and Dialect Adaptation for Machine Translation into Egyptian Arabic

Download (257.39 kB)
journal contribution
posted on 25.10.2014, 00:00 by Serena Jeblee, Weston Freely, Houda Bouamor, Alon Lavie, Nizar Habash, Kemal Oflazer
In this paper, we present a statistical machine translation system for English to Dialectal Arabic (DA), using Modern Standard Arabic (MSA) as a pivot. We create a core system to translate from English to MSA using a large bilingual parallel corpus. Then, we design two separate pathways for translation from MSA into DA: a two-step domain and dialect adaptation system and a one-step simultaneous domain and dialect adaptation system. Both variants of the adaptation systems are trained on a 100k sentence tri-parallel corpus of English, MSA, and Egyptian Arabic generated by a rule-based transformation. We test our systems on a held-out Egyptian Arabic test set from the 100k sentence corpus and we achieve our best performance using the two-step domain and dialect adaptation system with a BLEU score of 42.9.


Publisher Statement

Published in Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), pages 196–206, October 25, 2014, Doha, Qatar.