Advancing Model-Based Reinforcement Learning with Applications in Nuclear Fusion
Reinforcement learning (RL) may be the key to overcoming previ ous insurmountable obstacles, leading to technological and scientific innovations. One such example where RL could have a sizable impact is in tokamak control. Tokamaks are one of the most promising devices for making nuclear fusion into a viable energy source. They operate by magnetically confining a plasma; however, sustaining the plasma for long periods of time and at high pressures remains a challenge for the tokamak control community. RL may be able to learn how to sustain the plasma, but like many exciting applications of RL, it is infeasible to collect data on the real device in order to learn a policy. In this thesis, we explore learning policies using surrogate models of the environment, and especially using surrogate models that are learned from an offline data source. To start in Part I, we investigate the scenario in which one has access to a simulator that can be used to generate data, but the simulator is too computationally taxing to use data-hungry deep RL algorithms. We instead suggest a Bayesian optimization algorithm to learn such a policy. Following this, we pivot to the setting in which surrogate models of the environment can be learned with offline data. While these models are much more compu tationally cheap, their predictions inevitably contain errors. As such, both robust policy learning procedures and good uncertainty quantifi cation of model errors are crucial for success. To address the former, in Part II we propose a trajectory stitching algorithm that accounts for these modeling errors and a policy network architecture that is adaptive, yet robust. Part III shifts focus onto uncertainty quantification, where we propose a more intelligent uncertainty sampling procedure and a neural process architecture for learning uncertainties efficiently. In the f inal part, we detail how we learned models to predict plasma evolution, how weused these models to train a neutral beam controller, and the results of deploying this controller on the DIII-D tokamak
History
Date
2024-04-11Degree Type
- Dissertation
Department
- Machine Learning
Degree Name
- Doctor of Philosophy (PhD)