To integrate high amounts of renewable energy resources, power grids must be able to cope with high amplitude, fast timescale variations in power generation. Frequency regulation through demand response has the potential to coordinate temporally flexible loads, such as air conditioners, to counteract these variations. Existing approaches for discrete control with dynamic constraints struggle to provide satisfactory performance for fast timescale action selection with hundreds of agents. We propose a decentralized agent trained by multi-agent proximal policy optimization with localized communication. We show that the resulting policy leads to good and robust performance for frequency regulation and scales seamlessly to arbitrary numbers of houses for constant processing times, where classical methods fail.