Skip to yearly menu bar Skip to main content

Workshop: Data Centric AI

FedHist: A Federated-First Dataset for Learning inHealthcare


Recently federated learning has emerged as a leading approach to applying modern deep learning techniques in healthcare (FL4H). Existing research in FL4H suffers from a lack of data: either making use of datasets from outside of the problem domain or ad-hoc applying federated learning techniques to existing healthcare datasets that were designed for centralized methods. In this paper we introduce the first healthcare dataset specifically designed to enable and accelerate federated learning approaches. We release a dataset comprised of over 10,000 whole slide images collected for cell nuclei segmentation and processed for distributed learning. We also provide guidelines on how to split these images across simulated devices for federated learning research. Additionally, we automatically segment the data into categories reflecting its underlying modalities to evaluate potential for transfer learning. Using this dataset we conduct extensive benchmarks of distributed learning methods and compare them to centralized algorithms, both from a performance and privacy standpoint.