Carnegie Mellon University
Browse

Towards Deployable Reinforcement Learning: Safety, Robustness, Adaptivity, and Scalability

Download (20.15 MB)
thesis
posted on 2024-04-19, 17:01 authored by Zuxin Liu

 

The increasing demand to apply reinforcement learning (RL) in safety-critical domains accentuates the essential need for safe, robust, and versatile RL algorithms. This thesis directly addresses this imperative by introducing a suite of advanced policy optimization algorithms aimed at overcoming the key challenges faced by safe RL, thus paving the way toward more reliable and practical deployments.

The first part of the thesis focuses on enhancing sample efficiency and training stability — crucial aspects of deployable safe RL. We propose the Constrained Variational Policy Optimization (CVPO) method, which reformulates the safe RL problem into a two-stage optimization process. This approach not only ensures efficient and stable learning but also provides strong performance guarantees, making it a superior choice for practical safe RL applications in terms of both safety and sample efficiency.

The second part of the thesis delves into robustness, a critical component of deployable RL, against observational perturbations. We uncover the vulnerability of learned safe policies to stealthy yet unsafe behavior inductions. Our findings emphasize the need for robust adversarial training to improve safety under adverse conditions. Building on this, we first introduce an on-policy adversarial training pipeline and then present SAFER, an off-policy method derived from CVPO, which effectively enhances policy robustness and safety in adversarial settings.

Lastly, the thesis addresses the adaptivity and scalability issue of deployable RL by learning from static offline datasets. It introduces the Constrained Decision Transformer (CDT), a novel approach utilizing sequential modeling techniques to allow dynamic adjustment of the trade-offs between safety and task performance during deployment. Alongside CDT, the thesis proposes TAIL, a scalable training paradigm for continual learning, efficiently adapting pretrained models to new tasks while mitigating catastrophic forgetting and overfitting.

In summary, this thesis is dedicated to pushing the boundaries of safe, robust, and scalable policy optimization, marking strides toward deployable RL in safety-critical areas. The proposed methods offer robust, efficient, and adaptable solutions, crucial for the real-world deployment of RL systems.

History

Date

2024-01-10

Degree Type

  • Dissertation

Department

  • Mechanical Engineering

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Ding Zhao

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC