A methodology for using crowdsourced data to measure uncertainty in natural speech

Martin, Lara; Stone, Matthew; Metze, Florian; Mostow, Jack

doi:10.1184/R1/6472973.v1

file.pdf (292.23 kB)

A methodology for using crowdsourced data to measure uncertainty in natural speech

journal contribution

posted on 2014-12-01, 00:00 authored by Lara Martin, Matthew Stone, Florian MetzeFlorian Metze, Jack Mostow

People sometimes express uncertainty unconsciously in order to add layers of meaning on top of their speech, conveying doubts about the accuracy of the information they are trying to communicate. In this paper, we propose a methodology for annotating uncertainty, which is usually a subjective and expensive process, by using crowdsourcing. In our experiment, we used an online database which consists of colors that more than 200,000 users have named. Based on the amount of unique names that users have given each color, an entropy value was calculated to represent the uncertainty level of the color. A model, which performed better than chance, was created to predict whether or not the color that the participant was describing was ambiguous or borderline, given certain prosodic cues of their speech when asked to name the color verbally. Using crowdsourced data can greatly streamline the process of annotating uncertainty, but our methods have yet to be tested in other domains besides color. By using methods such as ours to measure prosodic attributes of uncertainty, it should be possible to increase the accuracy of voice search.

History

Publisher Statement

© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Date

2014-12-01

Usage metrics

Keywords

uncertainty speech analysis color annotation crowdsourcing

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

A methodology for using crowdsourced data to measure uncertainty in natural speech

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports