Stable Models and Temporal Difference Learning

Manek, Gaurav

doi:10.1184/R1/23298344.v1

Stable Models and Temporal Difference Learning

thesis

posted on 2023-06-08, 20:51 authored by Gaurav ManekGaurav Manek

In this thesis, we investigate two different aspects of stability: the stability of neural network dynamics models and the stability of reinforcement learning algorithms. In the first chapter, we propose a new method for learning Lyapunov-stable dynamics models that are stable by construction, even when randomly initialized. We demonstrate the effectiveness of this method on damped multi-link pendulums and show how it can be used to generate high-fidelity video textures.

In the second and third chapters, we focus on the stability of Reinforcement Learning (RL). In the second chapter, we demonstrate that regularization, a common approach to addressing instability, behaves counterintuitively in RL settings. Not only is it sometimes ineffective, but it can also cause instability. We demonstrate this phenomenon in both linear and neural network settings. Further, standard importance sampling methods are also vulnerable to this.

In the third chapter, we propose a mechanism to stabilize off-policy RL through resampling. Called Projected Off-Policy TD (POP-TD), it resamples TD updates to come from a convex subset of “safe” distributions instead of (as in other resampling methods) resampling to the on-policy distribution. We show how this approach can mitigate the distribution shift problem in offline RL on a task designed to maximize such shift.

Overall, this thesis advances novel methods for dynamics model stability and training stability in reinforcement learning, questions existing assumptions in the field, and points to promising directions for stability in model and reinforcement learning.

History

Date

2023-05-12

Degree Type

Dissertation

Department

Computer Science

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

J. Zico Kolter

Usage metrics

Keywords

Lyapunov Stability Regularization Deadly Triad Offline Reinforcement Learning Temporal Difference Learning Reinforcement Learning Neural Networks Machine Learning Artificial Intelligence Information and Computing Sciences not elsewhere classified

Licence

In Copyright

Stable Models and Temporal Difference Learning

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports