Continual Robot Learning: Benchmarks and Modular Methods
Humans adapt continuously to the world around us, allowing us to acquire new skills and explore diverse environments seamlessly. Current AI methods, however, cannot attain this versatility. Instead, they are typically trained with vast datasets, and learn all tasks simultaneously. However, the trained models have limited ability to adapt to changing contexts, and are limited by available data. This challenge is particularly pronounced in robotics, where real world interaction data is scarce.
Instead, we envision a robot capable of continuously learning from both the environment and human interactions, quickly acquiring new information without overwriting past knowledge, and capable of adapting to a user’s specific needs.
In this thesis, we apply continual learning to robotics, with the goal of enabling crucial capabilities, including: the ability to apply prior information to new settings, maintain old information, sustain capacity for new skills, and understand context. We explore these across two learning modes: continual reinforcement learning (CRL), where the agent learns from experience, and continual imitation learning (CIL), where it learns from demonstrations.
However, substantial barriers hinder progress, including limited open-source resources, resource-intensive benchmarks, and impractical metrics for robotics. To address these challenges, we present CORA (COntinual Reinforcement Learning Agents), an open-source toolkit with benchmarks, baselines, and metrics to enhance CRL accessibility. CORA extends beyond catastrophic forgetting, evaluating models for forward transfer and generalization.
With this foundation, we introduce SANE (Self-Activating Neural Ensembles) to create a dynamic library of adaptable skills. SANE’s ensemble of independent modules learns and applies skills as needed, reducing forgetting. We demonstrate this method on several Procgen reinforcement learning task sets.
We then adapt SANE to a physical robot, the Stretch, with SANER (SANE for Robotics) using CIL. Leveraging our novel Attention-Based Interaction Policies (ABIP), SANER excels in few-shot learning, showcasing its effectiveness at generalization across various tasks.
SANERv2 further advances this capability, integrating natural language and achieving strong performance over a diverse set of 15 manipulation tasks in a simulated environment, RLBench. Remarkably, SANERv2 was also able to display the potential of independent modules, demonstrating that a node could be moved between agents without loss of performance, promising possible future composable ensembles.
- Robotics Institute
- Doctor of Philosophy (PhD)