Carnegie Mellon University
Browse
P12-2035.pdf (223.62 kB)

Transforming Standard Arabic to Colloquial Arabic

Download (223.62 kB)
journal contribution
posted on 2012-07-08, 00:00 authored by Emad Mohamed, Behrang Mohit, Kemal OflazerKemal Oflazer
We present a method for generating Colloquial Egyptian Arabic (CEA) from morphologically disambiguated Modern Standard Arabic (MSA). When used in POS tagging, this process improves the accuracy from 73.24% to 86.84% on unseen CEA text, and reduces the percentage of out-of vocabulary words from 28.98% to 16.66%. The process holds promise for any NLP task targeting the dialectal varieties of Arabic; e.g., this approach may provide a cheap way to leverage MSA data and morphological resources to create resources for colloquial Arabic to English machine translation. It can also considerably speed up the annotation of Arabic dialects.

History

Publisher Statement

Published in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 176–180, Jeju, Republic of Korea, 8-14 July 2012.

Date

2012-07-08

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC