Reinforcement Learning for Behavior Planning of Autonomous Vehicles in Urban Scenarios
How autonomous vehicles and human drivers share public transportation systems is an important problem, as fully automatic transportation environments are still a long way off. Behavioral decision making serves as a key link in autonomous driving technology. Within conventional self-driving technology, heuristic-based rules-enumeration methods fulfill major tasks for behavioral decision making. However, for such a complex behavior as driving, the development of a suitable set of rules is a laborious engineering task that does not guarantee an optimal policy. Reinforcement learning (RL) is a decision-making method with strong recent successes that is capable of solving for an optimal policy, and can map diverse observations to actions in a variety of complex situations. However, RL has its own problems, such as exceptionally long training times, unstable training results and difficult reward tuning.
In this thesis, we present a series of behavior planning structures and algorithms that are based on the advantages coming from both reinforcement learning and heuristic-based rules-enumeration. The resultant contributions include:
• Creation of an Automatically Generated Curriculum in order to increase the learning speed for RL.
• Improvement of the policy network of RL with an LSTM module in order to get better performance on a given task.
• Creation of a hierarchical RL structure with hybrid reward mechanism which can accomplish the behavior decision procedure with the help of heuristicbased methods.
• Application of the hierarchical RL structure to a comprehensive range of urban intersection scenarios, to include approaching, observation, and traversing.Compared to traditional heuristic-based rules-enumeration methods, which need a large amount of effort to design rules which can cover as many scenarios as possible, reinforcement learning can help to learn such an optimal policy automatically. On the other hand, our algorithm can help RL to be more sample-efficient and converges to an optimal policy faster than competing algorithms.
History
Date
2021-08-20Degree Type
- Dissertation
Department
- Electrical and Computer Engineering
Degree Name
- Doctor of Philosophy (PhD)