Learning the Communication of Intent Prior to Physical Collaboration
When performing physical collaboration tasks, like packing a picnic basket together, humans communicate strongly and often subtly via multiple channels like gaze, speech, gestures, movement and posture. Understanding and participating in this communication enables us to predict a physical action rather than react to it, producing seamless collaboration. In this paper, we automatically learn key discriminative features that predict the intent to handover an object using machine learning techniques. We train and test our algorithm on multi-channel vision and pose data collected from an extensive user study in an instrumented kitchen. Our algorithm outputs a tree of possibilities, automatically encoding various types of pre-handover communication. A surprising outcome is that mutual gaze and inter-personal distance, often cited as being key for interaction, were not key discriminative features. Finally, we discuss the immediate and future impact of this work for human-robot interaction.