Interactive Learning for Sequential Decisions and Predictions
Sequential prediction problems arise commonly in many areas of robotics and information processing: e.g., predicting a sequence of actions over time to achieve a goal in a control task, interpreting an image through a sequence of local image patch classifications, or translating speech to text through an iterative decoding procedure.
Learning predictors that can reliably perform such sequential tasks is challenging. Specifically, as predictions influence future inputs in the sequence, the datageneration process and executed predictor are inextricably intertwined. This can often lead to a significant mismatch between the distribution of examples observed during training (induced by the predictor used to generate training instances) and test executions (induced by the learned predictor). As a result, naively applying standard supervised learning methods - that assume independently and identically distributed training and test examples - often leads to poor test performance and compounding errors: inaccurate predictions lead to untrained situations where more errors are inevitable.
This thesis proposes general iterative learning procedures that leverage interactions between the learner and teacher to provably learn good predictors for sequential prediction tasks. Through repeated interactions, our approaches can efficiently learn predictors that are robust to their own errors and predict accurately during test executions. Our main approach uses existing no-regret online learning methods to provide strong generalization guarantees on test performance.
We demonstrate how to apply our main approach in various sequential prediction settings: imitation learning, model-free reinforcement learning, system identification, structured prediction and submodular list predictions. Its efficiency and wide applicability are exhibited over a large variety of challenging learning tasks, ranging from learning video game playing agents from human players and accurate dynamic models of a simulated helicopter for controller synthesis, to learning predictors for scene understanding in computer vision, news recommendation and document summarization. We also demonstrate the applicability of our technique on a real robot, using pilot demonstrations to train an autonomous quadrotor to avoid trees seen through its onboard camera (monocular vision) when flying at low-altitude in natural forest environments.
Our results throughout show that unlike typical supervised learning tasks where examples of good behavior are sufficient to learn good predictors, interaction is a fundamental part of learning in sequential tasks. We show formally that some level of interaction is necessary, as without interaction, no learning algorithm can guarantee good performance in general.