Collaborative Communication Interruption Management System (C-CIMS): Modeling Interruption Timings via Prosodic and Topic Modelling for Human-Machine Teams
Human-machine teaming aims to meld human cognitive strengths and the unique capabilities of smart machines to create intelligent teams adaptive to rapidly changing circumstances. One major contributor to the problem of human-machine teaming is a lack of communication skills on the part of the machine. The primary objective of this research is focused on a machine’s interruption timings or when a machine should share and communicate information with human teammates within human-machine teaming interactions. Previous work addresses interruption timings from the perspective of single human, multitasking and multiple human, single task interactions. The primary aim of this dissertation is to augment this area by approaching the same problem from the perspective of a multiple human, multitasking interaction. The proposed machine is the Collaborative Communication Interruption Management System (C-CIMS) which is tasked with leveraging speech information from a human-human task and making inferences on when to interrupt with information related to an orthogonal human-machine task. This study and previous literature both suggest monitoring task boundaries and engagement as candidate moments of interruptibility within multiple human, multitasking interactions. The goal then becomes designing an intermediate step between human teammate communication and points of interruptibility within these interactions. The proposed intermediate step is the mapping of low-level speech information such as prosodic and lexical information onto higher constructs indicative of interruptibility. C-CIMS is composed of a Task Boundary Prosody Model, a Task Boundary Topic Model, and finally a Task Engagement Topic Model. Each of these components are evaluated separately in terms of how they perform within two different simulated human-machine teaming scenarios and the speed vs. accuracy tradeoffs as well as other limitations of each module. Overall the Task Boundary Prosody Model is tractable within a real-time system because of the low-latency in processing prosodic information, but is less accurate at predicting task boundaries even within human-machine interactions with simple dialogue. Conversely, the Task Boundary and Task Engagement Topic Models do well inferring task boundaries and engagement respectively, but are intractable in a real-time system because of the bottleneck in producing automatic speech recognition transcriptions to make interruption decisions. The overall contribution of this work is a novel approach to predicting interruptibility within human-machine teams by modeling higher constructs indicative of interruptibility using low-level speech information.