Choosing the Right Evaluation for Machine Translation: an Examination of Annotator and Automatic Metric Performance on Human Judgment Tasks

Denkowski, Michael; Lavie, Alon

doi:10.1184/R1/6473108.v1

file.pdf (157.41 kB)

Choosing the Right Evaluation for Machine Translation: an Examination of Annotator and Automatic Metric Performance on Human Judgment Tasks

journal contribution

posted on 2010-10-01, 00:00 authored by Michael Denkowski, Alon LavieAlon Lavie

This paper examines the motivation, design, and practical results of several types of human evaluation tasks for machine translation. In addition to considering annotator performance and task informativeness over multiple evaluations, we explore the practicality of tuning automatic evaluation metrics to each judgment type in a comprehensive experiment using the METEOR-NEXT metric. We present results showing clear advantages of tuning to certain types of judgments and discuss causes of inconsistency when tuning to various judgment data, as well as sources of difficulty in the human evaluation tasks themselves

History

Publisher Statement

Date

2010-10-01

Usage metrics

Keywords

Language Technologies

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Choosing the Right Evaluation for Machine Translation: an Examination of Annotator and Automatic Metric Performance on Human Judgment Tasks

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports