Natural Language Direction Following for Robots in Unstructured Unknown Environments
Robots are increasingly performing collaborative tasks with people in homes, workplaces, and outdoors, and with this increase in interaction comes a need for efficient communication between human and robot teammates. One way to achieve this communication is through natural language, which provides a flexible and intuitive way to issue commands to robots without requiring specialized interfaces or extensive user training. One task where natural language understanding could facilitate humanrobot interaction is navigation through unknown environments, where a user directs a robot toward a goal by describing (in natural language) the actions necessary to reach the destination. Most existing approaches to following natural language directions assume that the robot has access to a complete map of the environment ahead of time. This assumption severely limits the potential environments in which a robot could operate, since collecting a semantically labeled map of the environment is expensive and time consuming. Following directions in unknown environments is much more challenging, as the robot must now make decisions using only information about the parts of the environment it has observed so far. In other words, absent a full map the robot must incrementally build up its map (using sensor measurements), and rely on this partial map to follow the direction. Some approaches to following directions in unknown environments do exist, but they implicitly restrict the structure of the environment, and have so far only been applied in simulated or highly structured environments. To date, no solution exists to the problem of real robots following natural directions through unstructured and unknown environments. We address this gap by formulating the problem of following directions in unstructured unknown environments as one of sequential decision making under uncertainty. In this setting, a policy reasons about the robot's knowledge of the world so far, and predicts a sequence of actions that follow the direction to bring the robot towards the goal. This approach provides two key benefits that will enable robots to understand natural language directions. First, this new formulation enables us to harness user demonstrations of people following directions to learn a policy that reasons about the uncertainty present in the environment. Second, we can extend this by predicting the parts of the environment the robot has not yet detected using information implicit in the given instruction. In this dissertation, we first show how robots can learn policies that reason about the uncertainty present in the environment. We describe an imitation learning approach to training policies that uses demonstrations of people giving and following directions. During direction following, the policy predicts a sequence of actions that explores the environment (discovering landmarks), backtracks when necessary (if the robot took a wrong turn), and explicitly declares when it reaches the destination. We show that this approach enables robots to correctly follow natural language directions in unknown environments, and generalizes to environments not encountered previously. Building upon this work, we propose a novel view of language as a sensor, whereby we "fill-in" the unknown parts of the environment beyond the range of the robot's traditional sensors using information implicit in the instruction. We exploit this information to hypothesize maps that are consistent with the language and our knowledge of the world so far, represented as a distribution over possible maps. We then use this distribution to guide the robot, informing a belief space policy that infers a sequence of actions to follow the instruction. We find that this use of language as a sensor enables robots to follow navigation commands in unknown environments with performance comparable to that of operating in a fully-known environment. We demonstrate our approach on three different mobile robots operating indoors and outdoors, as well as through extensive simulations. Together, learning policies and reasoning directly about the unknown parts of the environment provides a solution to the problem of following natural language directions in unstructured unknown environments. This work is one step towards allowing untrained users to control complex robots, which could one day enable seamless coordination in human-robot teams.