Detecting Off-Task Speech
Off-task speech is speech that strays away from an intended task. It occurs in many dialog applications, such as intelligent tutors, virtual games, health communication systems and human-robot cooperation. Off-task speech input to computers presents both challenges and opportunities for such dialog systems. On the one hand, off-task speech contains informal conversational style and potentially unbounded scope that hamper accurate speech recognition. On the other hand, an automated agent capable of detecting off-task speech could track users’ attention and thereby maintain the intended conversation by bringing a user back on task; also, knowledge of where off-task speech events are likely to occur can help the analysis of automatic speech recognition (ASR) errors. Related work has been done in confidence measures for dialog systems and detecting out-of-domain utterances. However, there is a lack of systematic study on the type of off-task speech being detected and generality of features capturing off-task speech. In addition, we know of no published research on detecting off-task speech in children’s interactions with an automated agent. The goal of this research is to fill in these blanks to provide a systematic study of off-task speech, with an emphasis on child-machine interactions.
To characterize off-task speech quantitatively, we used acoustic features to capture its speaking style; we used lexical features to capture its linguistic content; and we used contextual features to capture the relation of off-task speech to nearby utterances. Using these features, we trained an off-task speech detector that yielded 87% detection rate at a cost of 10% false positives on children’s oral reading. Furthermore, we studied the generality of these types of features by detecting off-task speech in data from four tutorial tasks ranging from oral reading to prompted free-form responses. In addition, we examined how the features help detect adults’ off-task speech in data from the CMU Let’s Go bus information system. We show that lexical features detect more task-related off-task speech such as complaints about the system, whereas acoustic features detect more unintelligible speech and non-speech events such as mumbling and humming. Moreover, acoustic features tend to be more robust than lexical features when switching domains. Finally, we demonstrate how off-task speech detection can improve the performance on application-relevant metrics such as predicting fluency test scores in oral reading and understanding utterances in the CMU Let’s Go bus information system
History
Date
2012-05-20Degree Type
- Dissertation
Thesis Department
- Language Technologies Institute
Degree Name
- Doctor of Philosophy (PhD)