Federated learning has emerged as the predominant framework for distributed machine learning over decentralized data, e.g. in mobile phones. The usual approaches suffer from a distribution shift: the model is trained to fit the average population distribution but is deployed on individual clients, whose data distributions can be quite different. We present a distributionally robust approach to federated learning based on a risk measure known as the superquantile and show how to optimize it by interleaving federated averaging steps with quantile computation. We demonstrate experimentally that our approach is competitive with usual ones in terms of average error and outperforms them in terms of tail statistics of the error.