Creating Multi-Modal, User-Centric Records of Meetings with the Carnegie Mellon Meeting Recorder Architecture
Our goal is to build conversational agents that combine information from speech, gesture, hand-writing, text and presentations to create an understanding of the ongoing conversation (e.g. by identifying the action items agreed upon), and that can make useful contributions to the meeting based on such an understanding (e.g. by confirming the details of the action items). To create a corpus of relevant data, we have implemented the Carnegie Mellon Meeting Recorder to capture detailed multi-modal recordings of meetings. This software differs somewhat from other meeting room architectures in that it focuses on instrumenting the individual rather than the room and assumes that the meeting space is not fixed in advance. Thus, most of the sensors are user-centric (closetalking microphones connected to laptop computers, instrumented note-pads, instrumented presentation software, etc), although some are indeed ”room-centric” (instrumented whiteboard, distant cameras, table-top microphones, etc). This paper describes the details of our data collection environment. We report on the current status of our data collection, transcription and higher-level discourse annotation efforts. We also describe some of our initial research on conversational turn-taking based on this corpus.