10.1184/R1/6373124.v1
Ahmed Salama
Ahmed
Salama
Houda Bouamor
Houda
Bouamor
Behrang Mohit
Behrang
Mohit
Kemal Oflazer
Kemal
Oflazer
YouDACC: the Youtube Dialectal Arabic Commentary Corpus
Carnegie Mellon University
2018
Corpus Compilation
YouTube
Arabic
2018-05-01 00:00:00
Journal contribution
https://kilthub.cmu.edu/articles/journal_contribution/YouDACC_the_Youtube_Dialectal_Arabic_Commentary_Corpus/6373124
In the Arab world, while Modern Standard Arabic is commonly used in formal written context, on sites like Youtube,
people are increasingly using Dialectal Arabic, the language for everyday use to comment on a video and interact
with the community. These user-contributed comments along with the video and user attributes, offer a rich source
of multi-dialectal Arabic sentences and expressions from different countries in the Arab world. This paper presents
YOUDACC, an automatically annotated large-scale multi-dialectal Arabic corpus collected from user comments on
Youtube videos. Our corpus covers different groups of dialects: Egyptian (EG), Gulf (GU), Iraqi (IQ), Maghrebi (MG)
and Levantine (LV). We perform an empirical analysis on the crawled corpus and demonstrate that our location-based
proposed method is effective for the task of dialect labeling.