Communication is a key bottleneck in distributed optimization, and, in particular, bandwidth and latency can be limiting factors when devices are connected over commodity networks, such as in Federated Learning. State-of-the-art techniques tackle these challenges by advanced compression techniques or delaying communication rounds according to predefined schedules. We present a new scheme that adaptively skips communication (broadcast and client uploads) by detecting slow-varying updates. The scheme automatically adjusts the communication frequency independently for each worker and the server. By utilizing an error-feedback mechanism~-- borrowed from the compression literature~--~we prove that the convergence rate is the same as for batch gradient descent %strongly-convex, in the convex and nonconvex smooth cases. We show that the total number of communication rounds between server and clients needed to achieve a targeted accuracy is reduced, even in the case when the data distribution is highly non-IID.