Timezone: »

FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings
Jean Ogier du Terrail · Samy-Safwan Ayed · Edwige Cyffers · Felix Grimberg · Chaoyang He · Regis Loeb · Paul Mangold · Tanguy Marchand · Othmane Marfoq · Erum Mushtaq · Boris Muzellec · Constantin Philippenko · Santiago Silva · Maria Teleńczuk · Shadi Albarqouni · Salman Avestimehr · Aurélien Bellet · Aymeric Dieuleveut · Martin Jaggi · Sai Praneeth Karimireddy · Marco Lorenzi · Giovanni Neglia · Marc Tommasi · Mathieu Andreux

Wed Nov 30 02:00 PM -- 04:00 PM (PST) @ Hall J #1031
Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few ($2$--$50$) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works have proposed representative datasets for cross-device FL, few realistic healthcare cross-silo FL datasets exist, thereby slowing algorithmic research in this critical application. In this work, we propose a novel cross-silo dataset suite focused on healthcare, FLamby (Federated Learning AMple Benchmark of Your cross-silo strategies), to bridge the gap between theory and practice of cross-silo FL.FLamby encompasses 7 healthcare datasets with natural splits, covering multiple tasks, modalities, and data volumes, each accompanied with baseline training code. As an illustration, we additionally benchmark standard FL algorithms on all datasets.Our flexible and modular suite allows researchers to easily download datasets, reproduce results and re-use the different components for their research. FLamby is available at~\url{www.github.com/owkin/flamby}.

Author Information

Jean Ogier du Terrail (Owkin)
Samy-Safwan Ayed (Université Côte d'Azur)
Edwige Cyffers (Inria)
Felix Grimberg (Swiss Federal Institute of Technology Lausanne)
Chaoyang He (FedML, Inc.)
Regis Loeb (Owkin)

I am currently employed by Owkin as a data scientist in the field of biotechnology research: we are looking to combine cutting-edge machine learning and biology to discover novel cancer drug candidates and train more accurate models on patient data in a privacy-preserving way. I have worked as a Machine Learning researcher at the Katholieke Universiteit Leuven on projects in the sectors of pharmaceutical and medical research. Beforehand I had a successful career in the financial markets industry in London: passionate about Science, Technology and their potential impact on our world, I went back to Mathematics and Artificial Intelligence to fulfill my insatiable intellectual curiosity and love for learning. This diversity of experience has made me a persevering, egoless, flexible and highly skilled professional.

Paul Mangold (Inria Lille)
Tanguy Marchand (Owkin)
Othmane Marfoq (Inria / Accenture)
Erum Mushtaq (University of Southern California)
Boris Muzellec (Owkin)
Constantin Philippenko (Ecole Polytechnique, IPParis)
Santiago Silva (INRIA)
Maria Teleńczuk (Owkin)
Shadi Albarqouni (HelmholtzAI)

Shadi Albarqouni is Senior Research Scientist at Chair for Computer Aided Medical Procedures (CAMP) at Technical University of Munich (TUM), Germany. He received his Ph.D. in Computer Science with summa cum laude in 2017. Since then, he has been working as a postdoctoral researcher at CAMP leading the Medical Image Analysis group with an emphasis on developing deep learning methods for medical applications. Albarqouni has more than 40 publications in both Medical Imaging Computing and Computer-Assisted Interventions published in IEEE TMI, MICCAI, IPCAI, IJCARS, BMVC, and ICRA. He serves as a reviewer for many journal IEEE TMI, IEEE JBHI, IJCARS and Pattern Recognition. Since 2015, he has been serving as a PC member for a couple of MICCAI workshops. Recently, he serves as an Area Chair at MICCAI 2019. His current research interests include Interpretable ML, Robustness, Uncertainty, Geometric Deep Models, and recently Federated Learning. He is also interested in Entrepreneurship and Startups for Innovative Medical Solutions. His goal is to help everyone in the world to get better healthcare services with the assistance of Informatics and computer science.

Salman Avestimehr (University of Southern California)
Aurélien Bellet (INRIA)
Aymeric Dieuleveut (Ecole Polytechnique, IPParis)
Martin Jaggi (EPFL)
Sai Praneeth Karimireddy (UC Berkeley)
Marco Lorenzi
Giovanni Neglia (Inria)
Marc Tommasi (INRIA)
Mathieu Andreux (Owkin)

More from the Same Authors