Abstract: Gradient methods form the foundation of current machine learning. A vast literature covers the use of stochastic gradients being simple unbiased estimators of the full gradient of our objective. In this talk, we discuss four applications motivated from practical machine learning, where this key assumption is violated, and show new ways to cope with gradients which are only loosely related to the original objective. We demonstrate that algorithms with rigorous convergence guarantees can still be obtained in such settings, for
federated learning on heterogeneous data,
personalized collaborative learning,
masked training of neural networks with partial gradients,
learning with malicious participants, in the sense of Byzantine robust training.