This paper is a technical overview of our recent work on reinforcement learning for controlling commercial cooling systems. Building on previous work on cooling data centers more efficiently, we recently conducted two live experiments in partnership with a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites.