Active Learning and Crowd-Sourcing for Machine Translation

Ambati, Vamshi; Vogel, Stephan; Carbonell, Jaime G.

doi:10.1184/R1/6620900.v1

Active Learning and Crowd-Sourcing for Machine Translation

journal contribution

posted on 2010-05-01, 00:00 authored by Vamshi Ambati, Stephan Vogel, Jaime G. Carbonell

In recent years, corpus based approaches to machine translation have become predominant, with Statistical Machine Translation (SMT) being the most actively progressing area. Success of these approaches depends on the availability of parallel corpora. In this paper we propose Active Crowd Translation (ACT), a new paradigm where active learning and crowd-sourcing come together to enable automatic translation for low-resource language pairs. Active learning aims at reducing cost of label acquisition by prioritizing the most informative data for annotation, while crowd-sourcing reduces cost by using the power of the crowds to make do for the lack of expensive language experts. We experiment and compare our active learning strategies with strong baselines and see significant improvements in translation quality. Similarly, our experiments with crowd-sourcing on Mechanical Turk have shown that it is possible to create parallel corpora using non-experts and with sufficient quality assurance, a translation system that is trained using this corpus approaches expert quality.

History

Publisher Statement

Copyright by the European Language Resources Association

Date

2010-05-01

Usage metrics

Keywords

Software Research Computer Software not elsewhere classified

Licence

In Copyright

Active Learning and Crowd-Sourcing for Machine Translation

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports