Timezone: »

Workshop
New Frontiers in Federated Learning: Privacy, Fairness, Robustness, Personalization and Data Ownership
Nghia Hoang · Lam Nguyen · Pin-Yu Chen · Tsui-Wei Weng · Sara Magliacane · Bryan Kian Hsiang Low · Anoop Deoras

Mon Dec 13 05:30 AM -- 04:00 PM (PST) @

Federated Learning (FL) has recently emerged as the de facto framework for distributed machine learning (ML) that preserves the privacy of data, especially in the proliferation of mobile and edge devices with their increasing capacity for storage and computation. To fully utilize the vast amount of geographically distributed, diverse and privately owned data that is stored across these devices, FL provides a platform on which local devices can build their own local models whose training processes can be synchronized via sharing differential parameter updates. This was done without exposing their private training data, which helps mitigate the risk of privacy violation, in light of recent policies such as the General Data Protection Regulation (GDPR). Such potential use of FL has since then led to an explosive attention from the ML community resulting in a vast, growing amount of both theoretical and empirical literature that push FL so close to being the new standard of ML as a democratized data analytic service.

Interestingly, as FL comes closer to being deployable in real-world scenarios, it also surfaces a growing set of challenges on trustworthiness, fairness, auditability, scalability, robustness, security, privacy preservation, decentralizability, data ownership and personalizability that are all becoming increasingly important in many interrelated aspects of our digitized society. Such challenges are particularly important in economic landscapes that do not have the presence of big tech corporations with big data and are instead driven by government agencies and institutions with valuable data locked up or small-to-medium enterprises & start-ups with limited data and little funding. With this forethought, the workshop envisions the establishment of an AI ecosystem that facilitates data and model sharing between data curators as well as interested parties in the data and models while protecting personal data ownership.

Poster Session: https://eventhosts.gather.town/app/8bJUNHsVwXWh0K2O/nffl

 Mon 5:30 a.m. - 6:00 a.m. Pre-workshop networking (Networking Session)  link » TBD Link » 🔗 Mon 6:00 a.m. - 6:10 a.m. Opening Remark 🔗 Mon 6:15 a.m. - 7:00 a.m. Keynote Talk: Building a New Economy: Federated Learning and Beyond (Alex Pentland) (Keynote Talk)  link »    Federated learning is hot, driven by concerns over privacy, security, and difficulty in securing data. This talk will cover how those concerns are shaping the technology, and discuss the tech and regulatory trends that are motivating nations and companies to deploy an internet layer for transactions that makes federated learning a core technology. Link » Alex Sandy' Pentland 🔗 Mon 6:55 a.m. - 7:00 a.m. Q&A with Professor Alex Pentland (Q/A Live Session) Live Q&A for the Keynote Talk Alex Sandy' Pentland 🔗 Mon 7:00 a.m. - 7:12 a.m. Contributed Talk 1: Personalized Neural Architecture Search for Federated Learning (Contributed Talk)  link »    TBD Link » Minh Hoang · Carl Kingsford 🔗 Mon 7:12 a.m. - 7:15 a.m. Contributed Talk 1 - Q/A Live session (Q/A Live session)  link » 🔗 Mon 7:15 a.m. - 7:27 a.m. Contributed Talk 2: A Unified Framework to Understand Decentralized and Federated Optimization Algorithms: A Multi-Rate Feedback Control Perspective (Contributed Talk)  link »    TBD Link » xinwei zhang · Mingyi Hong · Nicola Elia 🔗 Mon 7:27 a.m. - 7:30 a.m. Contributed Talk 2 - Q/A Live session (Q/A Live session)  link » TBD Link » 🔗 Mon 7:30 a.m. - 7:42 a.m. Contributed Talk 3: Architecture Personalization in Resource-constrained Federated Learning (Contributed Talk)  link »    TBD Link » Mi Luo · Fei Chen · Zhenguo Li · Jiashi Feng 🔗 Mon 7:42 a.m. - 7:45 a.m. Contributed Talk 3 - Q/A Live Session (Q/A Live Session)  link » TBD Link » 🔗 Mon 7:45 a.m. - 8:30 a.m. Keynote Talk: Permutation Compressors for Provably Faster Distributed Nonconvex Optimization (Peter Richtarik) (Keynote Talk)  link »    We study the MARINA method of Gorbunov et al (ICML 2021) -- the current state-of-the-art distributed non-convex optimization method in terms of theoretical communication complexity. Theoretical superiority of this method can be largely attributed to two sources: the use of a carefully engineered biased stochastic gradient estimator, which leads to a reduction in the number of communication rounds, and the reliance on independent stochastic communication compression operators, which leads to a reduction in the number of transmitted bits within each communication round. In this paper we i) extend the theory of MARINA to support a much wider class of potentially correlated compressors, extending the reach of the method beyond the classical independent compressors setting, ii) show that a new quantity, for which we coin the name Hessian variance, allows us to significantly refine the original analysis of MARINA without any additional assumptions, and iii) identify a special class of correlated compressors based on the idea of random permutations, for which we coin the term PermK, the use of which leads to $O(\sqrt{n})$ (resp.\ $O(1 + d/\sqrt{n})$) improvement in the theoretical communication complexity of MARINA in the low Hessian variance regime when $d\geq n$ (resp.\ $d \leq n$), where n is the number of workers and d is the number of parameters describing the model we are learning. We corroborate our theoretical results with carefully engineered synthetic experiments with minimizing the average of nonconvex quadratics, and on autoencoder training with the MNIST dataset. Link » Peter Richtarik 🔗 Mon 8:25 a.m. - 8:30 a.m. Q&A with Professor Peter Richtarik (Q/A Live Session) Peter Richtarik 🔗 Mon 8:30 a.m. - 9:15 a.m. Keynote Talk: Bringing Differential Private SGD to Practice: On the Independence of Gaussian Noise and the Number of Training Rounds (Marten van Dijk) (Keynote Talk)  link »    In DP-SGD each round communicates a local SGD update which leaks some new information about the underlying local data set to the outside world. In order to provide privacy, Gaussian noise with standard deviation $\sigma$ is added to local SGD updates after performing a clipping operation. We show that for attaining $(\epsilon,\delta)$-differential privacy $\sigma$ can be chosen equal to $\sqrt{2(\epsilon +\ln(1/\delta))/\epsilon}$ for $\epsilon=\Omega(T/N^2)$. In many existing machine learning problems, $N$ is always large and $T=O(N)$. Hence, $\sigma$ becomes independent'' of any $T=O(N)$ choice with $\epsilon=\Omega(1/N)$. This means that our $\sigma$ only depends on $N$ rather than $T$. This differential privacy characterization allows one to a-priori select parameters of DP-SGD based on a fixed privacy budget (in terms of $\epsilon$ and $\delta$) in such a way to optimize the anticipated utility (test accuracy) the most. This ability of planning ahead together with $\sigma$'s independence of $T$ (which allows local gradient computations to be split among as many rounds as needed, even for large $T$ as usually happens in practice) leads to an adaptive DP-SGD algorithm that allows a client to balance its privacy budget with the accuracy of the learned global model based on local test data. We notice that the current state-of-the art differential privacy accountant method based on $f$-DP has a closed form for computing the privacy loss for DP-SGD. However, due to its interpretation complexity, it cannot be used in a simple way to plan ahead. Instead, accountant methods are only used for keeping track of how privacy budget has been spent (after the fact). This is joint work with Nhuong V. Nguyen, Toan N. Nguyen, Lam M. Nguyen, and Phuong Ha Nguyen. Link » Marten van Dijk 🔗 Mon 9:10 a.m. - 9:15 a.m. Q&A with Dr. Marten van Dijk (Q/A Live Session) Marten van Dijk 🔗 Mon 9:15 a.m. - 10:00 a.m. Lunch Break 🔗 Mon 10:00 a.m. - 10:45 a.m. Keynote Talk: Fair or Robust: Addressing Competing Constraints in Federated Learning (Virginia Smith) (Keynote Talk)  link »    A defining trait of federated learning is the presence of heterogeneity, i.e., that data may differ significantly across the network. In this talk I discuss how heterogeneity affects issues of fairness and robustness in federated settings. Our work demonstrates that robustness to data/model poisoning attacks and fairness, measured as the uniformity of performance across devices, are constraints that can directly compete when training in heterogeneous networks. I then explore to what extent methods for personalized federated learning can mitigate the tension between these constraints. I end with promising directions of future work in personalization, fairness, and robustness for FL. Link » Virginia Smith 🔗 Mon 10:40 a.m. - 10:45 a.m. Q&A with A/Professor Virginia Smith (Q/A Live Session) Virginia Smith 🔗 Mon 10:45 a.m. - 10:57 a.m. Contributed Talk 4: Sharp Bounds for FedAvg (Local SGD) (Contributed Talk)  link »    TBD Link » Margalit Glasgow · Honglin Yuan · Tengyu Ma 🔗 Mon 10:57 a.m. - 11:00 a.m. Contributed Talk 4 - Q/A Live Session (Q/A Live Session)  link » TBD Link » 🔗 Mon 11:00 a.m. - 11:12 a.m. Contributed Talk 5: Efficient and Private Federated Learning with Partially Trainable Networks (Contributed Talk)  link »    TBD Link » Hakim Sidahmed · Zheng Xu · Yuan Cao 🔗 Mon 11:12 a.m. - 11:15 a.m. Contributed Talk 5 - Q/A Live Session (Q/A Live Session)  link » 🔗 Mon 11:15 a.m. - 11:27 a.m. Contributed Talk 6: FLoRA: Single-shot Hyper-parameter Optimization for Federated Learning (Contributed Talk)  link »    TBD Link » Yi Zhou · Parikshit Ram · Theodoros Salonidis · Nathalie Baracaldo Angel · Horst Samulowitz · Heiko Ludwig 🔗 Mon 11:27 a.m. - 11:30 a.m. Contributed Talk 6 - Q/A Live Session (Q/A Live Session)  link » TBD Link » 🔗 Mon 11:30 a.m. - 12:30 p.m. Poster Session 🔗 Mon 12:30 p.m. - 1:15 p.m. Keynote Talk: Towards Building a Responsible Data Economy (Dawn Song) (Keynote Talk)  link »    TBD Link » Dawn Song 🔗 Mon 1:10 p.m. - 1:15 p.m. Q&A with Professor Dawn Song (Q/A Live Session) Dawn Song 🔗 Mon 1:15 p.m. - 2:00 p.m. Keynote Talk: Personalization in Federated Learning: Adaptation and Clustering (Asu Ozdaglar) (Keynote Talk)  link »    In many machine learning applications, data are collected by a large number of devices, calling for a distributed architecture for learning models. Federated learning (FL) aims to address this challenge by providing a decentralized mechanism for leveraging the individual data and computational power of users. Classical FL relies on a single shared model for users but tends to perform poorly in the presence of data and task heterogeneity across users. This talk presents various approaches for developing multiple personalized” models for heterogeneous users. We first consider a meta-learning approach, where the goal is to generate an initial shared model that users adapt to their tasks using small number of additional local computations. Second, we consider a cluster-based approach which is more appropriate when there is substantial heterogeneity in user data distributions. We propose an algorithm that simultaneously learns cluster identities, while fully operating in a decentralized manner. Link » Asuman Ozdaglar 🔗 Mon 1:55 p.m. - 2:00 p.m. Q&A with Professor Asu Ozdaglar (Q/A Live Session) Asuman Ozdaglar 🔗 Mon 2:00 p.m. - 4:00 p.m. Post-workshop Networking (Networking Session) 🔗 Mon 2:00 p.m. - 2:10 p.m. Closing Remark 🔗 - Advanced Free-rider Attacks in Federated Learning (Poster) Federated learning is a new machine learning technology that multiple clients collaboratively to train a global model without sharing their local data. Due to the fact that clients have the direct control over their local models and training data, federated learning is inherently vulnerable to free-rider attacks that the malicious client forges local model parameters to get reward without contributing sufficient local data and computation resources. Recently, many different free-rider attacks have been proposed. However, existing attacks haven’t a good stealth property. The convergence property represents the convergent speed and final global model accuracy. The stealth property indicates the attacker’s ability to hide its local update. In this work, we first utilize the Ornstein-Uhlenbeck (OU) process to formalize the evolution of local and global training processes, and analyze the geometrical relationship of all clients’ local model updates. Then, we propose a scaled delta attack and an advanced free-rider attack. We also prove that advanced free-rider attack can not only ensure the convergence of the aggregated model, but also hold the stealth property. Expriment results demonstrate that our advanced free-rider attack is feasible and can escape from state-of-the-art defense mechanisms. Our results show that even a highly constrained adversary can carry out the advanced free-rider attack while simultaneously maintaining stealth under the defense strategies, which highlights the vulnerability of the federated learning setting and the need to develop effective defense strategies. Zhenqian Zhu · Jiangang Shu · Xiaohua Jia 🔗 - CosSGD: Communication-Efficient Federated Learning with a Simple Cosine-Based Quantization (Poster) Federated learning is a promising framework to mitigate data privacy and computation concerns. However, the communication cost between the server and clients has become the major bottleneck for successful deployment. Despite notable progress in gradient compression, the existing quantization methods require further improvement when low-bits compression is applied, especially the overall systems often degenerate a lot when quantization are applied in double directions to compress model weights and gradients. In this work, we propose a simple cosine-based nonlinear quantization and achieve impressive results in compressing round-trip communication costs. We are not only able to compress model weights and gradients at higher ratios than previous methods, but also achieve competing model performance at the same time. Further, our approach is highly suitable for federated learning problems since it has low computational complexity and requires only a little additional data to recover the compressed information. Extensive experiments have been conducted on image classification and brain tumor semantic segmentation using the CIFAR-10, and BraTS datasets where we show state-of-the-art effectiveness and impressive communication efficiency. Yang He · Hui-Po Wang · Maximilian Zenk · Mario Fritz 🔗 - Iterated Vector Fields and Conservatism, with Applications to Federated Learning (Poster) We study when iterated vector fields (vector fields composed with themselves) are conservative. We give explicit examples of vector fields for which this self-composition preserves conservatism. Notably, this includes gradient vector fields of loss functions associated to some generalized linear models (including non-convex functions). As we show, characterizing the set of smooth vector fields satisfying this condition yields non-trivial geometric questions. In the context of federated learning, we show that when clients have loss functions whose gradient satisfies this condition, federated averaging is equivalent to gradient descent on a surrogate loss function. We leverage this to derive novel convergence results for federated learning. By contrast, we demonstrate that when the client losses violate this property, federated averaging can yield behavior which is fundamentally distinct from centralized optimization. Finally, we discuss theoretical and practical questions our analytical framework raises for federated learning. Zachary Charles · Keith Rush 🔗 - Scalable Average Consensus with Compressed Communications (Poster) We propose a new decentralized average consensus algorithm with compressed communication that scales linearly with the network size $n$. We prove that the proposed method converges to the average of the initial values held locally by the agents of a network when agents are allowed to communicate with compressed messages. The proposed algorithm works for a broad class of compression operators (possibly biased), where agents interact over arbitrary static, undirected, and connected networks. We further present numerical experiments that confirm our theoretical results and illustrate the scalability and communication efficiency of our algorithm. M. Taha Toghani · Cesar Uribe 🔗 - FedJAX: Federated learning simulation with JAX (Poster) Federated learning is a machine learning technique that enables training across decentralized data. Recently, federated learning has become an active area of research due to an increased focus on privacy and security. In light of this, a variety of open source federated learning libraries have been developed and released. We introduce FedJAX, a JAX-based open source library for federated learning simulations that emphasizes ease-of-use in research. With its simple primitives for implementing federated learning algorithms, prepackaged datasets, models and algorithms, and fast simulation speed, FedJAX aims to make developing and evaluating federated algorithms faster and easier for researchers. Our benchmark results show that FedJAX can be used to train models with federated averaging on the EMNIST dataset in a few minutes and the Stack Overflow dataset in roughly an hour with standard hyperparameters using TPUs. Jae Hun Ro · Ananda Theertha Suresh · Ke Wu 🔗 - Decentralized Personalized Federated Min-Max Problems (Poster) Personalized Federated Learning has recently seen tremendous progress, allowing the design of novel machine learning applications preserving privacy of the data used for training. Existing theoretical results in this field mainly focus on distributed optimization under minimization problems. This paper is the first to study PFL for saddle point problems, which cover a broader class of optimization tasks and are thus of more relevance for applications than the minimization. In this work, we consider a recently proposed PFL setting with the mixing objective function, an approach combining the learning of a global model together with local distributed learners. Unlike most of the previous papers, which considered only the centralized setting, we work in a more general and decentralized setup. This allows to design and to analyze more practical and federated ways to connect devices to the network. Our contribution is establishing the first lower bounds for this formulation and design two new optimal algorithms matching these lower bounds. A theoretical analysis of these methods is presented for smooth (strongly-)convex-(strongly-)concave saddle point problems. We also demonstrate the effectiveness of our problem formulation and the proposed algorithms on experiments with neural networks with adversarial noise. Ekaterina Borodich · Aleksandr Beznosikov · Abdurakhmon Sadiev · Vadim Sushko · Alexander Gasnikov 🔗 - Minimal Model Structure Analysis for Input Reconstruction in Federated Learning (Poster) \ac{fl} proposed a distributed \ac{ml} framework where every distributed worker owns a complete copy of global model and their own data. The training is occurred locally, which assures no direct transmission of training data. However, the recent work \citep{zhu2019deep} demonstrated that input data from a neural network may be reconstructed only using knowledge of gradients of that network, which completely breached the promise of \ac{fl} and sabotaged the user privacy. In this work, we aim to further explore the theoretical limits of reconstruction, speedup and stabilize the reconstruction procedure. We show that a single input may be reconstructed with the analytical form, regardless of network depth using a fully-connected neural network with one hidden node. Then we generalize this result to a gradient averaged over batches of size $B$. In this case, the full batch can be reconstructed if the number of hidden units exceeds $B$. For a \ac{cnn}, the number of required kernels in convolutional layers is decided by multiple factors, e.g., padding, kernel and stride size, etc. We require the number of kernels $h\geq (\frac{d}{d^{\prime}})^2C$, where we define $d$ as input width, $d^{\prime}$ as output width after convolutional layer, and $C$ as channel number of input. We validate our observation and demonstrate the improvements using bio-medical (fMRI, \ac{wbc}) and benchmark data (MNIST, Kuzushiji-MNIST, CIFAR100, ImageNet and face images). Jia Qian · Hiba Nassar · Lars Kai Hansen 🔗 - Certified Federated Adversarial Training (Poster) In federated learning (FL), robust aggregation schemes have been developed to protect against malicious clients. Many robust aggregation schemes rely on certain numbers of benign clients being present in a quorum of workers. This can be hard to guarantee when clients can join at will, or join based on factors such as idle system status, and connected to power and WiFi. We tackle the scenario of securing FL systems conducting adversarial training when a quorum of workers could be completely malicious. We model an attacker who poisons the model to insert a weakness into the adversarial training such that the model displays apparent adversarial robustness, while the attacker can exploit the inserted weakness to bypass the adversarial training and force the model to misclassify adversarial examples. We use abstract interpretation techniques to detect such stealthy attacks and block the corrupted model updates. We show that this defence can preserve adversarial robustness even against an adaptive attacker. Giulio Zizzo · Ambrish Rawat 🔗 - Private Federated Learning Without a Trusted Server: Optimal Algorithms for Convex Losses (Poster) This paper studies the problem of federated learning (FL) in the absence of a trustworthy server/clients. In this setting, each client needs to ensure the privacy of its own data without relying on the server or other clients. We study local differential privacy (LDP) and provide tight upper and lower bounds that establish the minimax optimal rates (up to logarithms) for LDP convex/strongly convex federated stochastic optimization. Our rates match the optimal statistical rates in certain practical parameter regimes ("privacy for free"). Second, we develop a novel time-varying noisy SGD algorithm, leading to the first non-trivial LDP risk bounds for FL with non-i.i.d. clients. Third, we consider the special case where each client's loss function is empirical and develop an accelerated LDP FL algorithm to improve communication complexity compared to existing works. We also provide matching lower bounds, establishing the optimality of our algorithm for convex/strongly convex settings. Fourth, with a secure shuffler to anonymize client reports (but without a trusted server), our algorithm attains the optimal central DP rates for stochastic convex/strongly convex optimization, thereby achieving optimality in the local and central models simultaneously. Our upper bounds quantify the role of network communication reliability in performance. Finally, we validate our theoretical results and illustrate the practical utility of our algorithm with numerical experiments. Andrew Lowy · Meisam Razaviyayn 🔗 - Certified Robustness for Free in Differentially Private Federated Learning (Poster) Federated learning (FL) provides an efficient training paradigm to jointly train a global model leveraging data from distributed users. As the local training data comes from different users who may not be trustworthy, several studies have shown that FL is vulnerable to poisoning attacks where adversaries add malicious data during training. On the other hand, to protect the privacy of users, FL is usually trained in a differentially private way (DPFL). Given these properties of FL, in this paper, we aim to ask: Can we leverage the innate privacy property of DPFL to provide robustness certification against poisoning attacks? Can we further improve the privacy of FL to improve such certification? To this end, we first investigate both the user-level and instance-level privacy of FL, and propose novel randomization mechanisms and analysis to achieve improved differential privacy. We then provide two robustness certification criteria: certified prediction and certified attack cost for DPFL on both levels. Theoretically, given different privacy properties of DPFL, we prove their certified robustness under a bounded number of adversarial users or instances. Empirically, we conduct extensive experiments to verify our theories under different attacks on a range of datasets. We show that the global model with a tighter privacy guarantee always provides stronger robustness certification in terms of the certified attack cost, while may exhibit tradeoffs regarding the certified prediction. We believe our work will inspire future research of developing certifiably robust DPFL based on its inherent properties. Chulin Xie · Yunhui Long · Pin-Yu Chen · Krishnaram Kenthapadi · Bo Li 🔗 - FedBABU: Towards Enhanced Representation for Federated Image Classification (Poster) Federated learning has evolved to improve a single global model under data heterogeneity (as a curse) or to develop multiple personalized models using data heterogeneity (as a blessing). However, there has been little research considering both directions simultaneously. In this paper, we first investigate the relationship between them by analyzing Federated Averaging at the client level and determine that a better federated global model performance does not constantly improve personalization. To elucidate the cause of this personalization performance degradation problem, we decompose the entire network into the body (i.e., extractor), related to universality, and the head (i.e., classifier), related to personalization. We then point out that this problem stems from training the head. Based on this observation, we propose a novel federated learning algorithm, coined as FedBABU, which updates only the body of the model during federated training (i.e., the head is randomly initialized and never updated), and the head is fine-tuned for personalization during the evaluation process. Extensive experiments show consistent performance improvements and an efficient personalization of FedBABU. Jaehoon Oh · SangMook Kim · Se-Young Yun 🔗 - FedMix: A Simple and Communication-Efficient Alternative to Local Methods in Federated Learning (Poster) Federated Learning (FL) is an increasingly popular machine learning paradigm in which multiple nodes try to collaboratively learn under privacy, communication and multiple heterogeneity constraints. A persistent problem in federated learning is that it is not clear what the optimization objective should be: the standard average risk minimization of supervised learning is inadequate in handling several major constraints specific to federated learning, such as communication adaptivity and personalization control. We identify several key desiderata in frameworks for federated learning and introduce a new framework, FedMix, that takes into account the unique challenges brought by federated learning. FedMix has a standard finite-sum form, which enables practitioners to tap into the immense wealth of existing (potentially non-local) methods for distributed optimization. Through a smart initialization that does not require any communication, FedMix does not require the use of local steps but is still provably capable of performing dissimilarity regularization on par with local methods. We give several algorithms for solving the FedMix formulation efficiently under communication constraints. Finally, we corroborate our theoretical results with extensive experimentation. Elnur Gasanov · Ahmed Khaled Ragab Bayoumi · Samuel Horváth · Peter Richtarik 🔗 - Bayesian SignSGD Optimizer for Federated Learning (Poster) Federated Learning is a distributed Machine Learning framework aimed at training a global model by sharing edge nodes' locally trained models instead of their datasets. This presents three major challenges: communication between edge nodes and the central node; heterogeneity of edge nodes (e.g. availability, computing, datasets); and security. In this paper we focus on the communication challenge, which is two-fold: decreasing the number of communication rounds; and compressing the information sent back and forth between edge nodes and the central node. Particularly, we are interested in cases where strict constraints over the allowed network traffic of gradients may apply – e.g. frequent training of predictive models for globally distributed devices. The recent success of 1-bit compressor (e.g. majority voting SignSGD) is promising; however, such high-compression methods are known to have slow (or problematic) convergence. We propose a Bayesian framework, named BB-SignSGD, encompassing 1-bit compressors for a principled and flexible choice of how much information to carry from previous communication rounds during central aggregation. We prove that majority voting SignSGD is a special case of our framework when particular choices are taken within it. We present results from extensive experiments in five different datasets. We show that, compared to majority voting SignSGD, other choices within BB-SignSGD support higher learning rates to achieve faster convergence, competitive even with uncompressed communication. Paulo Ferreira · Pablo Silva · Vinicius Gottin 🔗 - Learning Federated Representations and Recommendations with Limited Negatives (Poster) Deep retrieval models are widely used for learning entity representations and recommendations. Federated learning provides a privacy-preserving way to train these models without requiring centralization of user data. However, federated deep retrieval models usually perform much worse than their centralized counterparts due to non-IID (independent and identically distributed) training data on clients, an intrinsic property of federated learning that limits negatives available for training. We demonstrate that this issue is distinct from the commonly studied client drift problem. This work proposes batch-insensitive losses as a way to alleviate the non-IID negatives issue for federated movie recommendations. We explore a variety of techniques and identify that batch-insensitive losses can effectively improve the performance of federated deep retrieval models, increasing the relative recall of the federated model by up to 93.15% and reducing the relative gap in recall between it and a centralized model from 27.22% - 43.14% to 0.53% - 2.42%. We also open-source our code framework to accelerate further research and applications of federated deep retrieval models. Lin Ning · Sushant Prakash 🔗 - Secure Aggregation for Buffered Asynchronous Federated Learning (Poster) Federated learning (FL) typically relies on synchronous training, which is slow due to stragglers. While asynchronous training handles stragglers efficiently, it does not ensure privacy due to the incompatibility with the secure aggregation protocols. A buffered asynchronous training protocol known as FedBuff has been proposed recently which bridges the gap between synchronous and asynchronous training to mitigate stragglers and to also ensure privacy simultaneously. FedBuff allows the users to send their updates asynchronously while ensuring privacy by storing the updates in a trusted execution environment (TEE) enabled private buffer. TEEs, however, have limited memory which limits the buffer size. Motivated by this limitation, we develop a buffered asynchronous secure aggregation (BASecAgg) protocol that does not rely on TEEs. The conventional secure aggregation protocols cannot be applied in the buffered asynchronous setting since the buffer may have local models corresponding to different rounds and hence the masks that the users use to protect their models may not cancel out. BASecAgg addresses this challenge by carefully designing the masks such that they cancel out even if they correspond to different rounds. Our convergence analysis and experiments show that BASecAgg almost has the same convergence guarantees as FedBuff without relying on TEEs. Jinhyun So · Ramy Ali · Basak Guler · Salman Avestimehr 🔗 - What Do We Mean by Generalization in Federated Learning? (Poster) Federated learning data is drawn from a distribution of distributions: clients are drawn from a meta-distribution, and their data are drawn from personal data distributions. Thus generalization studies in federated learning should separate performance gaps from unseen client data (out-of-sample gap) from performance gaps from unseen client distributions (participation gap). In this work, we propose a framework for disentangling these performance gaps. Using this framework we observe and explain differences in behavior across natural and synthetic federated datasets, indicating that dataset synthesis strategy can be important for realistic simulations of generalization in federated learning. We propose a semantic synthesis strategy that enables realistic simulation without naturally-partitioned data. Honglin Yuan · Warren Morningstar · Lin Ning 🔗 - FLoRA: Single-shot Hyper-parameter Optimization for Federated Learning (Poster) We address the relatively unexplored problem of hyper-parameter optimization (HPO) for federated learning (FL-HPO). We introduce {\bf F}ederated {\bf Lo}ss Su{\bf R}face {\bf A}ggregation (FLoRA), the first FL-HPO solution framework that can address use cases of tabular data and gradient boosting training algorithms in addition to stochastic gradient descent/neural networks commonly addressed in the FL literature. The framework enables single-shot FL-HPO, by first identifying a good set of hyper-parameters that are used in a {\em single} FL training. Thus, it enables FL-HPO solutions with minimal additional communication overhead compared to FL training without HPO. Our empirical evaluation of FLoRA for Gradient Boosted Decision Trees on seven OpenML data sets demonstrates significant model accuracy improvements over the considered baseline, and robustness to increasing number of parties involved in FL-HPO training. Yi Zhou · Parikshit Ram · Theodoros Salonidis · Nathalie Baracaldo Angel · Horst Samulowitz · Heiko Ludwig 🔗 - Robust and Personalized Federated Learning with Spurious Features: an Adversarial Approach (Poster) The most common approach for personalized federated learning is fine-tuning the global machine learning model to each client. While this addresses some issues of statistical diversity, we find that such personalization methods are vulnerable to spurious features, leading to bias and sacrificing generalization. Nevertheless, debiasing the personalized models is difficult. To this end, we propose a strategy to mitigate the effect of spurious features based on an observation that the global model in the federated learning step has a low bias degree due to statistical diversity. Then, we estimate and mitigate the bias degree difference between the personalized and global models using adversarial transferability in the personalization step. We theoretically establish the connection between the adversarial transferability and the bias degree difference between the global and personalized models. Empirical results on MNIST, CelebA, and Coil20 datasets show that our method improves the accuracy of the personalized model on the bias-conflicting data samples by up to 14.3%, compared to existing personalization approaches, while preserving the benefit of enhanced average accuracy from fine-tuning. Xiaoyang Wang · Han Zhao · Klara Nahrstedt · Sanmi Koyejo 🔗 - Detecting Poisoning Nodes in Federated Learning by Ranking Gradients (Poster) We propose a simple, yet effective defense against poisoning attacks in Federated Learning. Our approach transforms the update gradients from local nodes into a matrix containing the rankings of local nodes across all model parameter dimensions. We then distinguish the malicious nodes from the benign nodes with key characteristics of the rank domain, specifically, the mean and standard deviation of a node's parameter rankings. Under mild conditions, we prove that our approach is guaranteed to detect all malicious nodes under typical Byzantine poisoning attack settings with no prior knowledge or history about the participating nodes. The effectiveness of our proposed approach is further confirmed by experiments on two classic datasets. Compared to the state-of-art methods in the literature for defending Byzantine attacks, our approach is unique in its way of identifying the malicious nodes by ranking and its robustness to effectively defense a wide range of attacks. Wanchuang Zhu · Benjamin Zhao · Simon Luo · Ke Deng 🔗 - Federating for Learning Group Fair Models (Poster) Federated learning is an increasingly popular paradigm that enables a large number of entities to collaboratively learn better models. In this work, we study minmax group fairness in paradigms where different participating entities may only have access to a subset of the population groups during the training phase. We formally analyze how this fairness objective differs from existing federated learning fairness criteria that impose similar performance across participants instead of demographic groups. We provide an optimization algorithm -- FedMinMax -- for solving the proposed problem that provably enjoys the performance guarantees of centralized learning algorithms. We experimentally compare the proposed approach against other methods in terms of group fairness in various federated learning setups. Afroditi Papadaki · Natalia Martinez · Martin Bertran · Guillermo Sapiro · Miguel Rodrigues 🔗 - Cronus: Robust and Heterogeneous Collaborative Learning with Black-Box Knowledge Transfer (Poster) Majority of existing collaborative learning algorithms propose to exchange local model parameters between collaborating clients via a central server. Unfortunately, this approach has many known security and privacy weaknesses, primarily because of the high dimensionality of the updates involved; furthermore, it is limited to models with homogeneous architectures. The high dimensionality of the updates makes these approaches more susceptible to poisoning attacks and exposes the local models to inference attacks. Based on this intuition, we propose Cronus that uses knowledge transfer via model outputs} to exchange information between clients. We show that this significantly reduces the dimensions of the clients' updates, and therefore, improves the robustness of the server's aggregation algorithm. Our extensive evaluations demonstrates that Cronus outperforms state-of-the-art robust federated learning algorithms. Furthermore, we show that treating local models as black-box significantly reduces the information leakage. Finally, Cronus also allows collaboration between models with heterogeneous architectures. CHANG hongyan · Virat Shejwalkar · Reza Shokri · Amir Houmansadr 🔗 - FeO2: Federated Learning with Opt-Out Differential Privacy (Poster) The trained model in federated learning (FL) might still leak private client information through model updates, even if clients' data is kept local. Differential privacy (DP) can be employed to provide privacy guarantees in FL, typically at the cost of degraded model performance. One fundamental feature of FL is \emph{heterogeneity}. While data and system heterogeneity have been studied, heterogeneity in privacy requirements has not been addressed in FL. In this work, we consider a heterogeneous privacy setup where clients are considered private by default, but some of them choose to opt out of privacy. We propose a new algorithm for personalized federated learning with opt-out DP, referred to as \emph{FeO2}, along with a discussion on its advantages compared to the baselines of private and personalized FL algorithms. We show the success of \emph{FeO2} in a simplified federated point estimation problem. Finally, we conduct extensive experiments on federated datasets to show the gain in performance for \emph{FeO2} compared to the baseline private and personalized federated learning algorithms. We observe that \emph{FeO2} provides significant gains for the global model as well as the personalized models compared to the baseline private federated learning. Additionally, we show that clients who opt out can gain up to $3.5\%$ in performance compared to private clients for the considered datasets, illustrating an incentive for clients to opt out. Nasser Aldaghri · Hessam Mahdavifar · Ahmad Beirami 🔗 - Architecture Personalization in Resource-constrained Federated Learning (Poster) Federated learning aims to collaboratively train a global model across a set of clients without data sharing among them. In most earlier studies, a global model architecture, either predefined by experts or searched automatically, is applied to all the clients. However, this convention is impractical for two reasons: 1) The clients may have heterogeneous resource constraints and only be able to handle models with particular configurations, imposing high requirements on the model’s versatility; 2) Data in the real-world federated system are highly non-IID, which means a model architecture optimized for all clients may not achieve optimal performance on personalized data on individual clients. In this work, we address the above two issues by proposing a novel framework that automatically discovers personalized model architectures tailored for clients’ specific resource constraints and data, called Architecture Personalization Federated Learning (APFL). APFL first trains a sizable global architecture and slims it adaptively to meet computational budgets on edge devices. Then, APFL offers a communication-efficient federated partial aggregation (FedPA) algorithm to allow mutual learning among clients with diverse local architectures, which largely boosts the overall performance. Extensive empirical evaluations on three federated datasets clearly demonstrate that APFL provides affordable and personalized architectures for individual clients, costing fewer communication bytes and achieving higher accuracy compared with manually defined architectures under the same resource budgets. Mi Luo · Fei Chen · Zhenguo Li · Jiashi Feng 🔗 - RVFR: Robust Vertical Federated Learning via Feature Subspace Recovery (Poster) Vertical Federated Learning (VFL) is a distributed learning paradigm that allows multiple agents to jointly train a global model when each agent holds a different subset of features for the same sample(s). VFL is known to be vulnerable to backdoor attacks. However, unlike the standard horizontal federated learning, improving the robustness of VFL remains challenging. To this end, we propose RVFR, a novel robust VFL training and inference framework. The key to our approach is to ensure that with a low-rank feature subspace, a small number of attacked samples, and other mild assumptions, RVFR recovers the underlying uncorrupted features with guarantees, thus sanitizes the model against a vast range of backdoor attacks. Further, RVFR also defends against inference-time adversarial feature attack. Our empirical studies further corroborate the robustness of the proposed framework. Jing Liu · Chulin Xie · Krishnaram Kenthapadi · Sanmi Koyejo · Bo Li 🔗 - Personalized Neural Architecture Search for Federated Learning (Poster) Federated Learning (FL) is a recently proposed learning paradigm for decentralized devices to collaboratively train a predictive model without exchanging private data. Existing FL frameworks, however, assume a one-size-fit-all model architecture to be collectively trained by local devices, which is determined prior to observing their data. Even with good engineering acumen, this often falls apart when local tasks are different and require diverging choices of architecture modelling to learn effectively. This motivates us to develop a novel personalized neural architecture search (NAS) algorithm for FL, which learns a base architecture that can be structurally personalized for quick adaptation to each local task. On several real-world datasets, our algorithm, \textsc{FedPNAS} is able to achieve superior performance compared to other benchmarks on heterogeneous multitask scenarios. Minh Hoang · Carl Kingsford 🔗 - FedHist: A Federated-First Dataset for Learning in Healthcare (Poster) Recently federated learning has emerged as a leading approach to applying modern deep learning techniques in healthcare (FL4H). Existing research in FL4H suffers from a lack of data: either making use of datasets from outside of the problem domain or ad-hoc applying federated learning techniques to existing healthcare datasets that were designed for centralized methods. In this paper we introduce the first healthcare dataset specifically designed to enable and accelerate federated learning approaches. We release a dataset comprised of over 10,000 whole slide images collected for cell nuclei segmentation and processed for distributed learning. We also provide guidelines on how to split these images across simulated devices for federated learning research. Additionally, we automatically segment the data into categories reflecting its underlying modalities to evaluate potential for transfer learning. Using this dataset we conduct extensive benchmarks of distributed learning methods and compare them to centralized algorithms, both from a performance and privacy standpoint. Usmann Khan 🔗 - Efficient and Private Federated Learning with Partially Trainable Networks (Poster) Federated learning is used for decentralized training of machine learning models on a large number (millions) of edge mobile devices. It is challenging because mobile devices usually often have limited communication bandwidth, and local computation resources. Therefore, how to improve the efficiency of federated learning is critical for scalability and usability. In this paper, we propose to leverage partially trainable neural networks, which freeze a portion of the model parameters during the entire training process, to reduce the communication cost with little implications on model performance. Through extensive experiments, we empirically show that Federated learning of Partially Trainable neural networks (FedPT) can result in good communication-accuracy trade-offs, with up to 46x reduction in communication cost, at a small accuracy cost. Our approach also enables faster training, with a smaller memory footprint, and higher resilience to strong privacy guarantees. The proposed FedPT can be particularly interesting for pushing the limitations of overparameterization in on-device learning. Hakim Sidahmed · Zheng Xu · Yuan Cao 🔗 - Sharp Bounds for FedAvg (Local SGD) (Poster) Federated Averaging (FedAvg), also known as Local SGD, is one of the most popular and the de facto algorithm in Federated Learning (FL). This distributed optimization algorithm involves running stochastic gradient descent (SGD) simultaneously at many machines, and infrequently averaging the iterates across the machines. Despite its simplicity and popularity, the convergence rate of FedAvg has thus far been undetermined. In this work, we provide a lower bound for FedAvg that matches the existing upper bound in convex homogeneous setting. As an extension, we also establish lower bound in heterogeneous (non-iid) setting that matches the existing upper bound up to the definition of heterogeneity measure. Our analysis is based on a sharp characterization of the drift of the expectation of a SGD iterate. Margalit Glasgow · Honglin Yuan · Tengyu Ma 🔗 - FairFed: Enabling Group Fairness in Federated Learning (Poster) As machine learning becomes increasingly incorporated in crucial decision-making scenarios such as healthcare, recruitment, and loan assessment, there have been increasing concerns about the privacy and fairness of such systems. Federated learning has been viewed as a promising solution for collaboratively learning machine learning models among multiple parties while maintaining the privacy of their local data. However, federated learning also poses new challenges in mitigating the potential bias against certain populations (e.g., demographic groups), which typically requires centralized access to the sensitive information (e.g., race, gender) of each data point. Motivated by the importance and challenges of group fairness in federated learning, in this work, we propose FairFed, a novel algorithm to enhance group fairness via a fairness-aware aggregation method, aiming to provide fair model performance across different sensitive groups (e.g., racial, gender groups) while maintaining high utility. The formulation can potentially provide more flexibility in the customized local debiasing strategies for each client. When running federated training on two widely investigated fairness datasets, Adult and COMPAS, our proposed method outperforms the state-of-the-art fair federated learning frameworks under a high heterogeneous sensitive attribute distribution. Yahya Ezzeldin · Shen Yan · Chaoyang He · Emilio Ferrara · Salman Avestimehr 🔗 - Secure Byzantine-Robust Distributed Learning via Clustering (Poster) Federated learning systems that jointly preserve Byzantine robustness and privacy have remained an open problem. Robust aggregation, the standard defense for Byzantine attacks, generally requires server access to individual updates or nonlinear computation -- thus is incompatible with privacy-preserving methods such as secure aggregation via multiparty computation. To this end, we propose SHARE (Secure Hierarchical Robust Aggregation), a distributed learning framework designed to cryptographically preserve client update privacy and robustness to Byzantine adversaries simultaneously. The key idea is to incorporate secure averaging among randomly clustered clients before filtering malicious updates through robust aggregation. Experiments show that SHARE has similar robustness guarantees as existing techniques while enhancing privacy. Raj Kiriti Velicheti · Sanmi Koyejo 🔗 - WAFFLE: Weighted Averaging for Personalized Federated Learning (Poster) In federated learning, model personalization can be a very effective strategy to deal with statistical heterogeneity across clients. We introduce WAFFLE (Weighted Averaging For Federated LEarning): a personalized collaborative machine learning algorithm based on SCAFFOLD. SCAFFOLD uses stochastic control variates to converge towards a model close to the globally optimal model even in classification tasks where the marginal distribution of labels across clients is highly skewed. However, WAFFLE uses the Euclidean distance between clients’ updates to weigh their contributions and thus minimize the trained model’s loss on one specific agent. Through a series of experiments, we compare our proposed new method to two recent personalized federated learning methods, Weight Erosion and APFL, as well as two global methods, federated averaging and SCAFFOLD. We evaluate our method using two categories of non-identical client distributions (concept shift and label skew) on two benchmarked image data sets, MNIST and CIFAR10. Our experiments demonstrate the effectiveness of WAFFLE compared with other methods, as it achieves or improves accuracy with faster convergence. Martin Beaussart · Annie Hartley · Martin Jaggi 🔗 - Contribution Evaluation in Federated Learning: Examining Current Approaches (Poster) Federated Learning (FL) has seen explosive interest in cases where entities want to collaboratively train models while maintaining their privacy and governance over their data. In FL, clients have their own, private and potentially heterogeneous, data, and compute resources, and come together to train a common model without raw data ever leaving their locale. Instead, the participants, which are either end-users or institutions, contribute by sharing local model updates, which, naturally, differ in quality. Quantitatively evaluating the worth of these contributions is termed the Contribution Evaluation (CE) problem. We review current CE approaches, from the underlying mathematical framework to efficiently calculating a fair value for each client. Furthermore, we benchmark some of the most promising state-of-the-art approaches, along with a new one we introduce, on MNIST and CIFAR-10, to showcase their differences. While a small part of the overall FL system design, designing a fair and efficient CE method, and an overall incentive mechanism for participants, is tantamount to the mainstream adoption of FL. Jonathan Passerat-Palmbach · Vasilis Siomos 🔗 - Bayesian Framework for Gradient Leakage (Poster) Federated learning is an established method for training machine learning models without sharing training data. However, recent work has shown that it cannot guarantee data privacy as shared gradients can still leak sensitive information. To formalize the problem of gradient leakage, we propose a theoretical framework that enables, for the first time, analysis of the Bayes optimal adversary phrased as an optimization problem. We demonstrate that existing leakage attacks can be seen as approximations of this optimal adversary with different assumptions on the probability distributions of the data and its respective gradients. Our experiments confirm the effectiveness of the Bayes optimal adversary when it has knowledge of the underlying distribution. Further, our experimental evaluation shows that several existing heuristic defenses are not effective against stronger attacks, especially early in the training process. Thus, our findings indicate that the construction of more effective defenses and their evaluation remains an open problem. Mislav Balunovic · Dimitar Dimitrov · Martin Vechev 🔗 - FedRAD: Federated Robust Adaptive Distillation (Poster) The robustness of federated learning (FL) is vital for the distributed training of an accurate global model that is shared among large number of clients. The collaborative learning framework by typically aggregating model updates is vulnerable to model poisoning attacks from adversarial clients. Since the shared information between the global server and participants are only limited to model parameters, it is challenging to detect bad model updates. Moreover, real-world datasets are usually heterogeneous and not independent and identically distributed (Non-IID) among participants, which makes the design of such robust FL pipeline more difficult. In this work, we propose a novel robust aggregation method, Federated Robust Adaptive Distillation (FedRAD), to detect adversaries and robustly aggregate local models based on properties of the median statistic, and then performing an adapted version of ensemble Knowledge Distillation. We run extensive experiments to evaluate the proposed method against recently published works. The results show that FedRAD outperforms all other aggregators in the presence of adversaries, as well as in heterogeneous data distributions. Stefán Sturluson · Luis Muñoz-González · Matei George Nicolae Grama · Jonathan Passerat-Palmbach · Daniel Rueckert · Amir Alansary 🔗 - FedGMA: Federated Learning with Gradient Masked Averaging (Poster) In cross-device federated optimization algorithms, the the two important constraints are the non-IIDness in the data distributed across clients and communication bottleneck. In this work, connections are drawn between the environments in an out-of-distribution(OOD) generalization setting and non-IID clients in a federated setting. To the federated setting, we adopt the OOD generalization hypothesis which states that learning only the invariant mechanisms while ignoring the spurious mechanisms in the train environment improves generalization performance on OOD test data. This paper proposes a gradient masked averaging that can be easily applied as an alternative to naive averaging updates in all federated algorithms like FedAVG, FedProx, SCAFFOLD, and adaptive federated optimizers like FedADAM and FedYogi. This masking improves the convergence of each algorithm in both IID and Non-IID data distributions across clients while reducing the number of communication rounds taken to converge. We introduce OOD generalization testing in federated learning and the proposed masking improves the OOD generalization performance of the corresponding federated algorithms. Irene Tenison · Sai Aravind Sreeramadas · Vaikkunth Mugunthan · Irina Rish 🔗 - Federated Reconnaissance: Efficient, Distributed, Class-Incremental Learning (Poster) We describe federated reconnaissance, a class of learning problems in which distributed clients learn new concepts independently and communicate that knowledge efficiently. In particular, we propose an evaluation framework and methodological baseline for a system in which each client is expected to learn a growing set of classes and communicate knowledge of those classes efficiently with other clients, such that, after knowledge merging, the clients should be able to accurately discriminate between classes in the superset of classes observed by the set of clients. We compare a range of learning algorithms for this problem and find that prototypical networks are a strong approach in that they are robust to catastrophic forgetting while incorporating new information efficiently. Furthermore, we show that the online averaging of prototype vectors is effective for client model merging and requires only a small amount of communication overhead, memory, and update time per class with no gradient-based learning or hyperparameter tuning. Additionally, to put our results in context, we find that a simple, prototypical network with four convolutional layers significantly outperforms complex, state of the art continual learning algorithms, increasing the accuracy by over 22% after learning 600 Omniglot classes and over 33% after learning 20 mini-ImageNet classes incrementally. These results have important implications for federated reconnaissance and continual learning more generally by demonstrating that communicating feature vectors is an efficient, robust, and effective means for distributed, continual learning. Sean Hendryx · Dharma R KC · Bradley Walls · Clayton Morrison 🔗 - A Unified Framework to Understand Decentralized and Federated Optimization Algorithms: A Multi-Rate Feedback Control Perspective (Poster) We propose a unified framework to analyze and design distributed optimization algorithms. Through the lens of multi-rate feedback control, we show that a wide class of distributed algorithms, including popular decentralized/federated schemes such as decentralized gradient descent, gradient tracking, and federated averaging, among others, can be viewed as discretizing a continuous-time feedback control system, but with different discretization patterns and/or multiple sampling rates. This key observation not only allows us to develop a generic framework to analyze the convergence of the entire algorithm class, more importantly, it leads to a new way of designing new distributed algorithms. We develop the theory behind our framework, and provide an example to highlight how the framework can be used to analyze and extend the well-known gradient tracking algorithm. xinwei zhang · Mingyi Hong · Nicola Elia 🔗