Chen_cmu_0041E_10815.pdf (9.28 MB)
Download file

Towards Safe and Sample-efficient Learning for Autonomous Energy Systems

Download (9.28 MB)
posted on 26.08.2022, 20:09 authored by Bingqing ChenBingqing Chen

Given the dire consequences of climate change, there are growing incentives to curb carbon emissions by reducing energy consumption and increasing the penetration of renewable energy generation, along with other measures to jointly combat this global challenge. In this thesis, we focus on learning-based controls to 1) reduce the energy consumption in buildings, and 2) to facilitate the integration of distributed energy resources (DERs). 

Recently, there is increasing interest in applying reinforcement learning (RL) for energy systems operation given that 1) high-fidelity models for these system are resourceintensive to develop and not commonly available, 2) energy systems are heterogeneous and the solution for one system may not be transferable to others, and 3) some of these systems are undergoing transitions and thus the control should be adaptive and future-proof. 

While RL is a promising solution, real-world applications of RL agents are numbered due to the facts that 1) RL agents generally take a long time to learn and reach acceptable performance and that 2) the actions by RL agents may not satisfy safety constraints posed by the underlying physical systems or the functional requirements. Thus, RL agents should learn safely and sample-efficiently to be practical for real-world energy systems. 

Firstly, we address the challenge of sample complexity in Research Question 1, grounded in the application of improving energy efficiency in building operations. We expedite the learning process by warm-starting the RL agent with expert demonstrations and by incorporating domain knowledge on building thermodynamics in its policy. We validate that our proposed agent, Gnu-RL, can be deployed on real-world testbeds with satisfactory initial performance, and improve energy efficiency over time. In a notable experiment, Gnu-RL was deployed to operate a real-world testbed for three weeks, wherein it saved 16.7% of cooling demand compared to the existing controller while maintaining better thermal comfort. In comparison to existing methods, Gnu-RL is both practical and scalable as it only requires historical data and minimal engineering to be applied to other buildings.

Secondly, we focus on the application of facilitating the integration of DERs from both the demand side and the supply side. In Research Question 2, we utilize the inherent flexibility in a class of building loads — thermostatically controlled loads (TCLs), which accounts for 20% of the electricity consumption in the United States — to provide grid services. By characterizing the set of admissible action sequences (i.e. feasible for the TCLs and satisfying the end-use requirements) we propose a distributed control solution, COHORT, to coordinate a population of heterogeneous TCLs to jointly provide grid services. We demonstrate that COHORT is applicable to use cases including, but not limited to, generation following, minimizing ramping, and peak load curtailment. Aside from simulation studies, we validated COHORT in a hardware-in-the-loop simulation, including a real-world testbed and simulated instances of TCLs. During the 15-day experimental period, COHORT reduced daily peak loads by an average of 12.5% and maintained comfortable temperatures. COHORT is computationally scalable to bothlarge population sizes and long planning horizons, which unlock the potential to shift TCLs over extended periods of time, e.g. shifting wind and solar power from times when it might otherwise be curtailed to times it may be needed over the course of a day. 

In Research Question 3, we extend Research Question 2 to incorporate network constraints on top of device-level constraints. Specifically, we focus on controlling inverters, through which DERs are connected to the distribution networks, to ensure voltage constraints are not violated, as over-voltage has already become a common occurrence in areas with high renewable penetration. On the IEEE 37-bus feeder system, our proposed approach, PROF satisfies the voltage constraints 100% of the time, compared to 22% over-voltage violations incurred by a Volt/Var control strategy. Voltage support from inverters increase the hosting capacity of the existing networks and reduce the curtailment of renewable generation. Furthermore, as the renewable energy resources gradually replace fossil fuel ones over the course of coming decades, a learning-based control strategy can adapt to the transitioning power grid.

Finally, we consider the problem of power system operation in the abstracted form of high-dimensional, non-linear systems. To enforce safety constraints in performancedriven learning in the general case of high-dimensional, non-linear systems, we propose SAGE by incorporating Hamilton-Jacobi (HJ) reachability theory, a safety verification method for non-linear systems, into the constrained Markov decision process (CMDP) framework. Though HJ reachability is traditionally not scalable to high-dimensional systems, we demonstrate that with neural approximation, the HJ safety value can be learned directly on vision context—the highest-dimensional problem studied via the method, todate. We evaluate our method on several benchmark tasks, including Safety Gym and Learn-to-Race (L2R), a recently-released high-fidelity autonomous racing environment. Our approach has significantly fewer constraint violations in comparison to other constrained RL baselines in Safety Gym, and achieves the new state-of-the-art results on the L2R benchmark task.




Degree Type



Civil and Environmental Engineering

Degree Name

  • Doctor of Philosophy (PhD)


Mario Berges,