Domain and Dialect Adaptation for Machine Translation into Egyptian Arabic
journal contributionposted on 25.10.2014 by Serena Jeblee, Weston Freely, Houda Bouamor, Alon Lavie, Nizar Habash, Kemal Oflazer
Any type of content formally published in an academic journal, usually following a peer-review process.
In this paper, we present a statistical machine translation system for English to Dialectal Arabic (DA), using Modern Standard Arabic (MSA) as a pivot. We create a core system to translate from English to MSA using a large bilingual parallel corpus. Then, we design two separate pathways for translation from MSA into DA: a two-step domain and dialect adaptation system and a one-step simultaneous domain and dialect adaptation system. Both variants of the adaptation systems are trained on a 100k sentence tri-parallel corpus of English, MSA, and Egyptian Arabic generated by a rule-based transformation. We test our systems on a held-out Egyptian Arabic test set from the 100k sentence corpus and we achieve our best performance using the two-step domain and dialect adaptation system with a BLEU score of 42.9.