Carnegie Mellon University
avillafl_phd_ri_2024.pdf (9.71 MB)

Offline Learning for Stochastic Multi-Agent Planning in Autonomous Driving

Download (9.71 MB)
posted on 2024-05-10, 19:44 authored by Adam VillaflorAdam Villaflor

Fully autonomous vehicles have the potential to greatly reduce vehicular accidents and revolutionize how people travel and how we transport goods. Many of the major challenges for autonomous driving systems emerge from the numerous traffic situations that require complex interactions with other agents. For the foreseeable future, autonomous vehicles will have to share the road with human drivers and pedestrians, and thus cannot rely on centralized communication to address these interactive scenarios. Therefore, autonomous driving systems need to be able to negotiate and respond to unknown agents that exhibit uncer?tain behavior. To tackle these problems, most commercial autonomous driving stacks use a modular approach that splits perception, agent forecasting, and planning into separately engineered modules. However, fully separating prediction and planning makes it difficult to reason how other vehicles will respond to the planned trajectory for the controlled ego-vehicle. So to maintain safety, many modular approaches have to be overly conservative when interacting with other agents. Ideally, we want autonomous vehicles to drive in a natural and confident manner, while still maintaining safety. 

Thus, in this thesis, we will explore how we can use deep learning and offline reinforce?ment learning to perform joint prediction and planning in highly interactive and stochastic multi-agent scenarios in autonomous driving. First, we discuss our work in using deep learning for joint prediction and closed-loop planning in an offline reinforcement learning (RL) framework (Chapter 2). Second, we discuss our work that directly tackles the difficulties of using learned models to do planning in stochastic multimodal settings (Chapter 3). Third, we discuss how we can scale to more complicated multi-agent driving scenarios like merging in dense traffic by using a Transformer-based traffic forecasting model as our world model (Chapter 4). Finally, we discuss how we can draw from offline model-based RL to learn a high-level policy that selects over a discrete set of pre-trained driving skills to perform effective control without additional online planning (Chapter 5). 




Degree Type

  • Dissertation


  • Robotics Institute

Degree Name

  • Doctor of Philosophy (PhD)


Jeff Schneider John Dolan

Usage metrics



    Ref. manager