A Human Judgment Corpus and a Metric for Arabic MT Evaluation

Bouamor, Houda; Alshikhabobakr, Hanan; Mohit, Behrang; Oflazer, Kemal

doi:10.1184/R1/6287699.v1

A Human Judgment Corpus and a Metric for Arabic MT Evaluation.pdf (276.16 kB)

A Human Judgment Corpus and a Metric for Arabic MT Evaluation

journal contribution

posted on 2014-10-01, 00:00 authored by Houda BouamorHouda Bouamor, Hanan AlshikhabobakrHanan Alshikhabobakr, Behrang Mohit, Kemal OflazerKemal Oflazer

We present a human judgments dataset

and an adapted metric for evaluation of

Arabic machine translation. Our mediumscale

dataset is the first of its kind for Arabic

with high annotation quality. We use

the dataset to adapt the BLEU score for

Arabic. Our score (AL-BLEU) provides

partial credits for stem and morphological

matchings of hypothesis and reference

words. We evaluate BLEU, METEOR and

AL-BLEU on our human judgments corpus

and show that AL-BLEU has the highest

correlation with human judgments. We

are releasing the dataset and software to

the research community.

History

Publisher Statement

This is the published version of Bouamor, H., Alshikhabobakr, H., Mohit, B., & Oflazer, K. (2014). A Human Judgement Corpus and a Metric for Arabic MT Evaluation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1026 © 2014 Association for Computational Linguistics

Date

2014-10-01

Usage metrics

Keywords

BLEU Machine Translation Evaluation Arabic Translation

Licence

CC BY-NC-SA 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

A Human Judgment Corpus and a Metric for Arabic MT Evaluation

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports