Carnegie Mellon University
Browse
file.pdf (75.06 kB)

Using the Amazon Mechanical Turk for Transcription of Spoken Language

Download (75.06 kB)
journal contribution
posted on 2008-01-01, 00:00 authored by Matthew R Marge, Satanjeev Banerjee, Alexander RudnickyAlexander Rudnicky

We investigate whether Amazon's Mechanical Turk (MTurk) service can be used as a reliable method for transcription of spoken language data. Utterances with varying speaker demographics (native and non-native English, male and female) were posted on the MTurk marketplace together with standard transcription guidelines. Transcriptions were compared against transcriptions carefully prepared in-house through conventional (manual) means. We found that transcriptions from MTurk workers were generally quite accurate. Further, when transcripts for the same utterance produced by multiple workers were combined using the ROVER voting scheme, the accuracy of the combined transcript rivaled that observed for conventional transcription methods. We also found that accuracy is not particularly sensitive to payment amount, implying that high quality results can be obtained at a fraction of the cost and turnaround time of conventional methods.

History

Publisher Statement

All Rights Reserved

Date

2008-01-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC