P12-2035.pdf (223.62 kB)

Transforming Standard Arabic to Colloquial Arabic

Download (223.62 kB)
journal contribution
posted on 08.07.2012, 00:00 by Emad Mohamed, Behrang Mohit, Kemal Oflazer
We present a method for generating Colloquial Egyptian Arabic (CEA) from morphologically disambiguated Modern Standard Arabic (MSA). When used in POS tagging, this process improves the accuracy from 73.24% to 86.84% on unseen CEA text, and reduces the percentage of out-of vocabulary words from 28.98% to 16.66%. The process holds promise for any NLP task targeting the dialectal varieties of Arabic; e.g., this approach may provide a cheap way to leverage MSA data and morphological resources to create resources for colloquial Arabic to English machine translation. It can also considerably speed up the annotation of Arabic dialects.

History

Publisher Statement

Published in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 176–180, Jeju, Republic of Korea, 8-14 July 2012.

Date

08/07/2012

Exports

Exports