Provable Algorithms for Reinforcement Learning: Efficiency, Scalability, and Robustness

Shi, Laixi

doi:10.1184/R1/24025908.v1

Shi_cmu_0041E_11042.pdf (3.16 MB)

Provable Algorithms for Reinforcement Learning: Efficiency, Scalability, and Robustness

thesis

posted on 2023-08-29, 21:13 authored by Laixi ShiLaixi Shi

Reinforcement learning (RL), which strives to learn desirable sequential decisions based on trial-and-error interactions with an unknown environment, has achieved remarkable success recently in a variety of domains including games and large language model alignment. In the face of unknown environments with unprecedentedly large dimensionality, making the best use of available samples inevitably lies at the core of RL, especially in ubiquitous sample-starved scenarios such as clinical trials and autonomous driving. To understand and tackle the challenges of sample efficiency, substantial progress has been made recently by developing a finite-sample theoretical framework to analyze the algorithms of interest and design provably optimal algorithms in terms of sample efficiency. Nevertheless, existing results still fall short with regards to the statistical understanding and algorithmic optimality in a wide range of RL settings. Moreover, motivated by countless scenarios with large dimensionality or sim-to-real gaps, sample efficiency needs to be considered along with scalability and robustness — two equally important principles in RL.

This thesis breaks down the sample barriers of various RL formulations, taking additional scalability and robustness into account. Specifically, for online RL that allows for adaptive interactions with the environment, this thesis provides the first provable regret-optimal model-free RL algorithm with a small burn-in cost — an initial sampling burden needed for the algorithm to exhibit the desired performance — while maintaining its memory efficiency for scalability. For offline RL that only has access to historical datasets, this thesis proposes the first provable near-optimal model-free offline RL algorithm without the need of performing model estimation, and settles the sample complexity by establishing the minimax optimality of model-based offline RL algorithms without burn-in cost. Finally, for a robust variant of standard RL — distributionally robust RL, this thesis uncovers a surprising fact that promoting additional distributional robustness in the learned policy is neither necessarily harder nor easier than standard RL in terms of sample requirements, which depends heavily on the prescribed uncertainty set. This thesis closes by providing the first provable near-optimal algorithm for offline robust RL that can learn under simultaneous model uncertainty and limited historical datasets.

History

Date

2023-07-28

Degree Type

Dissertation

Department

Electrical and Computer Engineering

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Yuejie Chi

Usage metrics

Keywords

Distributionally robustness Markov decision processes Reinforcement learning Sample efficiency Value-based RL Variance reduction

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Provable Algorithms for Reinforcement Learning: Efficiency, Scalability, and Robustness

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports