State-wise Safe Learning and Control
Ensuring safety by persistently satisfying hard state constraints is a critical capability in the fields of reinforcement learning (RL) and control. While RL and control have achieved impressive feats in performing challenging tasks, the lack of safety assurance remains a significant obstacle for real-world applications. Consequently, the research focus has shifted towards developing methods that meet stringent safety specifications in uncertain environments, driving the field of safe learning and control.
In the realm of safe control, energy function-based methodologies allocate diminished energy levels to safe states while orchestrating secure control laws to dissipate energy. However, prevailing safe control methods necessitate explicit analytical models of the dynamic system, a constraint often unmet in real-world scenarios. Moreover, current safe control techniques typically presuppose an unbounded control space, a premise divergent from the bounded nature of actual control spaces in reality. This incongruity can lead to the possibility of an empty set of safe controls, thereby jeopardizing the assurance of state-wise safety.
In the sphere of safe RL, extensive endeavors have been undertaken to address safety within the framework of Constrained Markov Decision Processes (CMDP), which is not capable of handling state-wise safety constraints. Furthermore, existing safe RL algorithms predominantly acquire policy learning through trial-and-error, a process that introduces inevitable unsafe exploration. This characteristic renders them unsuitable for training in real-world, safety-critical applications.
In this thesis, we present groundbreaking advancements in RL and control, ensuring state-wise safety by effectively addressing these challenges: (i) For safe control, we design energy functions to ensure a nonempty set of safe controls under dynamics limits and different knowledge levels of system dynamics, which can achieve forward invariance and finite time convergence. (ii) In safe learning, we propose a set of novel policy search algorithms for state-wise constrained RL. Specifically, (a) State-wise Constrained Policy Optimization (SCPO) guarantees state-wise constraint satisfaction in expectation per iteration, (b) Absolute Policy Optimization (APO) guarantees monotonic improvement of worst-case performance per iteration, and (c) Absolute State-wise Constrained Policy Optimization (ASCPO) guarantees worst-case state-wise constraint satisfaction per iteration. The proposed approaches accommodates high-dimensional neural network policies. Furthermore, we combine benefits from safe control and learning to pioneer an algorithm generating state-wise safe optimal policies with zero training violations, a learning-without-mistakes paradigm. (iii) Lastly, we introduce a comprehensive and adaptable benchmark, the first of its kind, for safe RL and control. This benchmark caters to diverse agents, tasks, and safety constraints, while offering unified implementations of cutting-edge safe learning and control algorithms within a controlled environment.
History
Date
2024-08-15Degree Type
- Dissertation
Department
- Electrical and Computer Engineering
Degree Name
- Doctor of Philosophy (PhD)