Unifying State and Policy-Level Explanations for Reinforcement Learning
Reinforcement learning (RL) is able to solve domains without needing to learn a model of the domain dynamics. When coupled with a neural network as a function approximator, RL systems can solve complex problems. However, verifying and predicting RL agent behavior is made difficult by these same properties; a learned policy conveys “what” to do, but not “why.”
This thesis focuses on producing explanations for deep RL, summaries of behavior and their causes that can be used for downstream analysis. Specifically, we focus on the setting where the final policy is obtained from a limited, known set of interactions with the environment. We categorize existing explanation methods along two axes:
- Whether a method explains single-action behavior or policy-level behavior
- Whether a method provides explanations in terms of state features or past experiences
Under this classification, there are four types of explanation methods, and they enable answering different questions about an agent. We introduce methods for creating explanations of these types. Furthermore, we introduce a unified explanation structure that is a combination of all four types. This structure enables obtaining further information about what an agent has learned and why it behaves as it does.
First, we introduce CUSTARD, our method for explaining single-action behavior in terms of state features. CUSTARD’s explanation is a decision tree representation of the policy. Unlike existing methods for producing such a decision tree, CUSTARD directly learns the tree without approximating a policy after training and is compatible with existing RL techniques.
We then introduce APG-Gen, our approach for creating a policy-level behavior explanation in terms of state features. APG-Gen produces a Markov chain over abstract states that enables predicting future actions and aspects of future states. APG-Gen only queries an agent’s Q-values, making no assumptions about an agent’s decision-making process.
We integrate these two methods to produce a Unified Explanation Tree (UET). A UET is a tree that maps from a state directly to both an action and an abstract state, thus unifying single-action and policy-level behavior explanations in terms of state features.
We extend existing work on finding important training points in deep neural networks. Our method, MRPS, produces explanations of single-action behavior in terms of past experiences. MRPS can find importance values for sets of points and accounts for feature magnitudes to produce more meaningful importance values.
Finally, we find the importance values of sets of past experiences for any node within a UET. Additionally, we introduce methods for computing approximate and exact influence for UET nodes. Since a UET conveys both single-action and policy-level behavior, these importance and influence values explain both levels of behavior in terms of past experiences. Our overall solution enables identifying the portion of the UET that would change if specific experiences were removed or added from the set used by the agent.
- Doctor of Philosophy (PhD)