Carnegie Mellon University
Browse

Detecting Off-Task Speech

Download (4.26 MB)
thesis
posted on 2025-05-20, 19:47 authored by Wei Chen

Off-task speech is speech that strays away from an intended task. It occurs in many dialog applications, such as intelligent tutors, virtual games, health communication systems and human-robot cooperation. Off-task speech input to computers presents both challenges and opportunities for such dialog systems. On the one hand, off-task speech contains informal conversational style and potentially unbounded scope that hamper accurate speech recognition. On the other hand, an automated agent capable of detecting off-task speech could track users’ attention and thereby maintain the intended conversation by bringing a user back on task; also, knowledge of where off-task speech events are likely to occur can help the analysis of automatic speech recognition (ASR) errors. Related work has been done in confidence measures for dialog systems and detecting out-of-domain utterances. However, there is a lack of systematic study on the type of off-task speech being detected and generality of features capturing off-task speech. In addition, we know of no published research on detecting off-task speech in children’s interactions with an automated agent. The goal of this research is to fill in these blanks to provide a systematic study of off-task speech, with an emphasis on child-machine interactions.

To characterize off-task speech quantitatively, we used acoustic features to capture its speaking style; we used lexical features to capture its linguistic content; and we used contextual features to capture the relation of off-task speech to nearby utterances. Using these features, we trained an off-task speech detector that yielded 87% detection rate at a cost of 10% false positives on children’s oral reading. Furthermore, we studied the generality of these types of features by detecting off-task speech in data from four tutorial tasks ranging from oral reading to prompted free-form responses. In addition, we examined how the features help detect adults’ off-task speech in data from the CMU Let’s Go bus information system. We show that lexical features detect more task-related off-task speech such as complaints about the system, whereas acoustic features detect more unintelligible speech and non-speech events such as mumbling and humming. Moreover, acoustic features tend to be more robust than lexical features when switching domains. Finally, we demonstrate how off-task speech detection can improve the performance on application-relevant metrics such as predicting fluency test scores in oral reading and understanding utterances in the CMU Let’s Go bus information system

History

Date

2012-05-20

Degree Type

  • Dissertation

Thesis Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Jack Mostow

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC