Datacenter Cooling Optimization using Deep Reinforcement Learning
Washington University in St. Louis — Fall 2024
This project, built for Washington University's CSE 510A (Deep Reinforcement Learning) course, tackles cooling optimization for small- and mid-sized datacenters — a segment that makes up most of the market but is far less studied than hyperscale facilities. Using the Sinergym wrapper around EnergyPlus, the project simulates a two-zone datacenter under stochastic summer weather and trains three deep RL agents — Dueling Double DQN (DDQN), PPO with generalized advantage estimation, and SAC with automatic entropy tuning — to control HVAC cooling against random and rules-based baselines.
DDQN with a discretized action space was the strongest performer, improving energy efficiency by 35.8% over a rules-based incremental controller, compared to 10.9% for PPO and a negligible gain for SAC. Adding weather-forecast inputs generally hurt performance. The results show that lightweight, model-free RL can meaningfully reduce cooling energy use even on the limited compute budgets typical of smaller facilities.
Highlights
- Three DRL agents (DDQN, PPO, SAC) trained against a realistic EnergyPlus simulation
- Custom Sinergym environment modeling a two-zone datacenter under stochastic weather
- DDQN achieved a 35.8% energy-efficiency improvement over a rules-based controller
- Benchmarked against random, rules-based, and rules-based-incremental baselines