Timezone: »
Cross-device federated learning is an emerging machine learning (ML) paradigm where a large population of devices collectively train an ML model while the data remains on the devices.This research field has a unique set of practical challenges, and to systematically make advances, new datasets curated to be compatible with this paradigm are needed.Existing federated learning benchmarks in the image domain do not accurately capture the scale and heterogeneity of many real-world use cases. We introduce FLAIR, a challenging large-scale annotated image dataset for multi-label classification suitable for federated learning.FLAIR has 429,078 images from 51,414 Flickr users and captures many of the intricacies typically encountered in federated learning, such as heterogeneous user data and a long-tailed label distribution.We implement multiple baselines in different learning setups for different tasks on this dataset. We believe FLAIR can serve as a challenging benchmark for advancing the state-of-the art in federated learning.Dataset access and the code for the benchmark are available at https://github.com/apple/ml-flair.
Author Information
Congzheng Song (Cornell Tech)
Filip Granqvist (Apple)
Applied research in private Machine Learning
Kunal Talwar (Apple)
More from the Same Authors
-
2021 : Enforcing fairness in private federated learning via the modified method of differential multipliers »
Borja Rodríguez Gálvez · Filip Granqvist · Rogier van Dalen · Matthew Seigel -
2023 Poster: Fast Optimal Locally Private Mean Estimation via Random Projections »
Hilal Asi · Vitaly Feldman · Jelani Nelson · Huy Nguyen · Kunal Talwar -
2022 Panel: Panel 1C-5: Privacy of Noisy… & Near-Optimal Private and… »
Shyam Narayanan · Kunal Talwar -
2022 Poster: Mean Estimation with User-level Privacy under Data Heterogeneity »
Rachel Cummings · Vitaly Feldman · Audra McMillan · Kunal Talwar -
2022 Poster: Subspace Recovery from Heterogeneous Data with Non-isotropic Noise »
John Duchi · Vitaly Feldman · Lunjia Hu · Kunal Talwar -
2022 Poster: Privacy of Noisy Stochastic Gradient Descent: More Iterations without More Privacy Loss »
Jason Altschuler · Kunal Talwar -
2020 Poster: Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses »
Raef Bassily · Vitaly Feldman · Cristóbal Guzmán · Kunal Talwar -
2020 Spotlight: Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses »
Raef Bassily · Vitaly Feldman · Cristóbal Guzmán · Kunal Talwar -
2020 Poster: Stochastic Optimization with Laggard Data Pipelines »
Naman Agarwal · Rohan Anil · Tomer Koren · Kunal Talwar · Cyril Zhang -
2020 Poster: Faster Differentially Private Samplers via Rényi Divergence Analysis of Discretized Langevin MCMC »
Arun Ganesh · Kunal Talwar -
2020 Poster: On the Error Resistance of Hinge-Loss Minimization »
Kunal Talwar -
2019 : Private Stochastic Convex Optimization: Optimal Rates in Linear Time »
Vitaly Feldman · Tomer Koren · Kunal Talwar -
2019 Poster: Private Stochastic Convex Optimization with Optimal Rates »
Raef Bassily · Vitaly Feldman · Kunal Talwar · Abhradeep Guha Thakurta -
2019 Spotlight: Private Stochastic Convex Optimization with Optimal Rates »
Raef Bassily · Vitaly Feldman · Kunal Talwar · Abhradeep Guha Thakurta -
2019 Poster: Computational Separations between Sampling and Optimization »
Kunal Talwar