Carnegie Mellon University
Browse

A Human Judgment Corpus and a Metric for Arabic MT Evaluation

Download (276.16 kB)
journal contribution
posted on 2014-10-01, 00:00 authored by Houda BouamorHouda Bouamor, Hanan AlshikhabobakrHanan Alshikhabobakr, Behrang Mohit, Kemal OflazerKemal Oflazer
<div>We present a human judgments dataset</div><div>and an adapted metric for evaluation of</div><div>Arabic machine translation. Our mediumscale</div><div>dataset is the first of its kind for Arabic</div><div>with high annotation quality. We use</div><div>the dataset to adapt the BLEU score for</div><div>Arabic. Our score (AL-BLEU) provides</div><div>partial credits for stem and morphological</div><div>matchings of hypothesis and reference</div><div>words. We evaluate BLEU, METEOR and</div><div>AL-BLEU on our human judgments corpus</div><div>and show that AL-BLEU has the highest</div><div>correlation with human judgments. We</div><div>are releasing the dataset and software to</div><div>the research community.</div>

History

Related Materials

Publisher Statement

This is the published version of Bouamor, H., Alshikhabobakr, H., Mohit, B., & Oflazer, K. (2014). A Human Judgement Corpus and a Metric for Arabic MT Evaluation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1026 © 2014 Association for Computational Linguistics

Date

2014-10-01