Enhancing Policy Transfer in Action Advising for Reinforcement Learning
As human students benefit from teachers’ advice to accelerate their learning, so could agents benefit from advice. Agents can not only learn from humans, e.g. via human expert demonstrations, but also from other agents. This thesis focuses on action advising, a knowledge transfer technique built upon the teacher-student paradigm of reinforcement learning. In this approach, the teacher agent provides action advice calculated from its policy given the student’s observations.
Although action advising has been studied over the past decade, the focus of the related work has primarily been on when to advise. We extend the current state-ofthe-art by studying the following additional challenges: (1) In existing work, advice is given without explaining the rationale behind it. The student can therefore hardly understand teacher’s decisions, or internalize the knowledge to generalize teacher’s advice; (2) In many situations, the teacher might be suboptimal in a new environment, but there are no current approaches that enable the student to discern when some particular pieces of advice might not be applicable; (3) No present techniques enable the teacher to evaluate the quality of its advice before giving it to the student; (4) If the student interacts in a new environment, the teacher has limited knowledge as it does not collect the student’s data; (5) The teacher with a fixed pre-trained policy might not be able to provide flexible advice; (6) The advice has rarely been applied to human students.
In this thesis, we present our solutions to tackle the aforementioned challenges and exhibit the empirical effectiveness of our proposed methods. We also propose potential pathways for subsequent research. Our ultimate goal is to delve into the intricacies and potentials of action advising in reinforcement learning, thereby directing future advancements in the field.
Additionally, this thesis broadens its scope by examining further applications of transfer learning, showcasing its utility and adaptability in varied contexts beyond action advising. These explorations contribute to a deeper understanding of transfer learning’s potential within the broader field of reinforcement learning.
Funding
S&AS: FND: A Stochastic Ethical Decision-Making Framework for Long-Term Autonomy
Directorate for Computer & Information Science & Engineering
Find out more...History
Date
2024-05-03Degree Type
- Dissertation
Department
- Computer Science
Degree Name
- Doctor of Philosophy (PhD)