Carnegie Mellon University
Browse

Active Learning with Multiple Annotations for Comparable Data Classification Task

Download (237.63 kB)
journal contribution
posted on 2011-06-01, 00:00 authored by Vamshi Ambati, Sanjika Hewavitharana, Stephan Vogel, Jaime G. Carbonell

Supervised learning algorithms for identifying comparable sentence pairs from a dominantly non-parallel corpora require resources for computing feature functions as well as training the classifier. In this paper we propose active learning techniques for addressing the problem of building comparable data for low-resource languages. In particular we propose strategies to elicit two kinds of annotations from comparable sentence pairs: class label assignment and parallel segment extraction. We also propose an active learning strategy for these two annotations that performs significantly better than when sampling for either of the annotations independently

History

Publisher Statement

Copyright 2011 The Association for Computational Linguistics

Date

2011-06-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC