Princeton University Users: If you would like to view a senior thesis while you are away from campus, you will need to connect to the campus network remotely via the Global Protect virtual private network (VPN). If you are not part of the University requesting a copy of a thesis, please note, all requests are processed manually by staff and will require additional time to process.
 

Publication:

A Comparison of Model Predictive Control and Reinforcement Learning Methods for Building Energy Storage Management

Loading...
Thumbnail Image

Files

written_final_report.pdf (1.2 MB)

Date

2025-04-10

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

The residential building sector is a major contributor to energy consumption and greenhouse gas emissions, making electrification and intelligent energy management essential for decarbonization. However, increased electricity demand can strain the power grid, leading to higher costs and emissions. Demand-side flexibility, enabled by on-site power generation, energy storage, and optimized control algorithms, can mitigate this problem by shifting electricity consumption to times when electricity is cheaper and cleaner.

This study evaluates three methods for centralized building energy storage management using CityLearn, an open-source environment for simulating and benchmarking building energy control. The evaluation compares Model Predictive Control (MPC) with two Reinforcement Learning (RL) methods: Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO). The methods are assessed across three dimensions: (1) energy performance, including cost, carbon emissions, electricity consumption, and stability of electricity use over time; (2) computational efficiency, including training time, memory usage, and inference speed; and (3) scalability, measured across different district sizes of two, four, and eight buildings.

Overall, SAC achieved the strongest performance on cost and energy metrics, performing slightly better than PPO in those areas. PPO, however, produced smoother control behavior with more stable electricity use over time while requiring significantly less memory than SAC and less computation than MPC. Both RL methods outperformed MPC across most metrics, with MPC particularly struggling to scale. Nonetheless, MPC remained more interpretable and required no training data, though it involved substantial engineering effort to develop an accurate system model.

These findings highlight trade-offs between performance, stability, and deployability. PPO emerged as the most balanced controller, offering strong performance with scalability and computational efficiency, making it well-suited for real-world use.

Description

Keywords

Citation