A methodology for using crowdsourced data to measure uncertainty in natural speech

<p>People sometimes express uncertainty unconsciously in order to add layers of meaning on top of their speech, conveying doubts about the accuracy of the information they are trying to communicate. In this paper, we propose a methodology for annotating uncertainty, which is usually a subjective and expensive process, by using crowdsourcing. In our experiment, we used an online database which consists of colors that more than 200,000 users have named. Based on the amount of unique names that users have given each color, an entropy value was calculated to represent the uncertainty level of the color. A model, which performed better than chance, was created to predict whether or not the color that the participant was describing was ambiguous or borderline, given certain prosodic cues of their speech when asked to name the color verbally. Using crowdsourced data can greatly streamline the process of annotating uncertainty, but our methods have yet to be tested in other domains besides color. By using methods such as ours to measure prosodic attributes of uncertainty, it should be possible to increase the accuracy of voice search.</p>