Timezone: »

Workshop
Federated Learning: Recent Advances and New Challenges
Shiqiang Wang · Nathalie Baracaldo · Olivia Choudhury · Gauri Joshi · Peter Richtarik · Praneeth Vepakomma · Han Yu

Fri Dec 02 06:30 AM -- 03:00 PM (PST) @ Room 298 - 299

Training machine learning models in a centralized fashion often faces significant challenges due to regulatory and privacy concerns in real-world use cases. These include distributed training data, computational resources to create and maintain a central data repository, and regulatory guidelines (GDPR, HIPAA) that restrict sharing sensitive data. Federated learning (FL) is a new paradigm in machine learning that can mitigate these challenges by training a global model using distributed data, without the need for data sharing. The extensive application of machine learning to analyze and draw insight from real-world, distributed, and sensitive data necessitates familiarization with and adoption of this relevant and timely topic among the scientific community.

Despite the advantages of FL, and its successful application in certain industry-based cases, this field is still in its infancy due to new challenges that are imposed by limited visibility of the training data, potential lack of trust among participants training a single model, potential privacy inferences, and in some cases, limited or unreliable connectivity.

The goal of this workshop is to bring together researchers and practitioners interested in FL. This day-long event will facilitate interaction among students, scholars, and industry professionals from around the world to understand the topic, identify technical challenges, and discuss potential solutions. This will lead to an overall advancement of FL and its impact in the community, while noting that FL has become an increasingly popular topic in the machine learning community in recent years.

 Fri 6:30 a.m. - 6:35 a.m. Opening Remarks Shiqiang Wang 🔗 Fri 6:35 a.m. - 6:53 a.m. Trustworthy Federated Learning (Invited Talk)    Advances in machine learning have led to rapid and widespread deployment of learning-based inference and decision-making for safety-critical applications, such as autonomous driving and security diagnostics. Current machine learning systems, however, assume that training and test data follow the same, or similar, distributions, and do not consider active adversaries manipulating either distribution. Recent work has demonstrated that motivated adversaries can circumvent anomaly detection or other machine learning models at test time through evasion attacks, or can inject well-crafted malicious instances into training data to induce errors in inference time through poisoning attacks, especially in the distributed learning setting. In this talk, I will describe my recent research about security and privacy problems in federated learning, with a focus on potential certifiable defense approaches, differentially private federated learning, and fairness in FL. We will also discuss other defense principles towards developing practical robust learning systems with trustworthiness guarantees. Bo Li 🔗 Fri 6:53 a.m. - 6:57 a.m. Trustworthy Federated Learning - Q&A (Q&A) 🔗 Fri 6:57 a.m. - 7:15 a.m. Asynchronous Optimization: Delays, Stability, and the Impact of Data Heterogeneity (Invited Talk)    In this talk, I will cover the recent advances in the study of asynchronous stochastic gradient descent (SGD). Previously, it was repeatedly stated in theoretical papers that the performance of Asynchronous SGD degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay. On the contrary, we prove much better guarantees for the same Asynchronous SGD algorithm regardless of the delays in the gradients, depending instead just on the number of parallel devices used to implement the algorithm. Our guarantees are strictly better than the existing analyses, and we also argue that asynchronous SGD outperforms synchronous minibatch SGD in the settings we consider. For our analysis, we introduce a novel recursion based on "virtual iterates" and delay-adaptive stepsizes, which allow us to derive state-of-the-art guarantees for both convex and non-convex objectives. Konstantin Mishchenko 🔗 Fri 7:15 a.m. - 7:19 a.m. Asynchronous Optimization: Delays, Stability, and the Impact of Data Heterogeneity - Q&A (Q&A) 🔗 Fri 7:20 a.m. - 7:27 a.m. Conditional Moment Alignment for Improved Generalization in Federated Learning (Oral)  link »    In this work, we study model heterogeneous Federated Learning (FL) for classification where different clients have different model architectures. Unlike existing works on model heterogeneity, we neither require access to a public dataset nor do we impose constraints on the model architecture of clients and ensure that the clients' models and data are private. We prove a generalization result, that provides fundamental insights into the role of the representations in FL and propose a theoretically grounded algorithm \textbf{Fed}erated \textbf{C}onditional \textbf{M}oment \textbf{A}lignment (\pap) that aligns class conditional distributions of each client in the feature space. We prove the convergence and empirically, we show that \pap outperforms other baselines on CIFAR-10, MNIST, EMNIST, FEMNIST in the considered setting. Link » Jayanth Reddy Regatti · Songtao Lu · Abhishek Gupta · Ness Shroff 🔗 Fri 7:30 a.m. - 7:37 a.m. Mechanisms that Incentivize Data Sharing in Federated Learning (Oral)  link »    Federated learning is typically considered a beneficial technology which allows multiple agents to collaborate with each other, improve the accuracy of their models, and solve problems which are otherwise too data-intensive / expensive to be solved individually. However, under the expectation that other agents will share their data, rational agents may be tempted to engage in detrimental behavior such as free-riding where they contribute no data but still enjoy an improved model.In this work, we propose a framework to analyze the behavior of such rational data generators. We first show how a naive scheme leads to catastrophic levels of free-riding where the benefits of data sharing are completely eroded. Then, using ideas from contract theory, we introduce accuracy shaping based mechanisms to maximize the amount of data generated by each agent. These provably prevent free-riding without needing any payment mechanism. Link » Sai Praneeth Karimireddy · Wenshuo Guo · Michael Jordan 🔗 Fri 7:40 a.m. - 7:47 a.m. Federated Learning with Online Adaptive Heterogeneous Local Models (Oral)  link »    In Federated Learning, one of the biggest challenges is that client devices often have drastically different computation and communication resources for local updates. To this end, recent research efforts have focused on training heterogeneous local models that are obtained by adaptively pruning a shared global model. Despite the empirical success, theoretical analysis of the convergence of these heterogeneous FL algorithms remains an open question. In this paper, we establish sufficient conditions for any FL algorithms with heterogeneous local models to converge to a neighborhood of a stationary point of standard FL at a rate of $O(\frac{1}{\sqrt{Q}})$. For general smooth cost functions and under standard assumptions, our analysis illuminates two key factors impacting the optimality gap between heterogeneous and standard FL: pruning-induced noise and minimum coverage index, advocating a joint design strategy of local models' pruning masks in heterogeneous FL algorithms. The results are numerically validated on MNIST and CIFAR-10 datasets. Link » Hanhan Zhou · Tian Lan · Guru Prasadh Venkataramani · Wenbo Ding 🔗 Fri 7:50 a.m. - 7:57 a.m. LightVeriFL: Lightweight and Verifiable Secure Federated Learning (Oral)  link »    Secure aggregation protocols are implemented in federated learning to protect the local models of the participating users so that the server does not obtain any information beyond the aggregate model at each iteration. However, existing secure aggregation schemes fail to protect the integrity, i.e., correctness, of the aggregate model in the possible presence of a malicious server forging the aggregation result, which motivates the need for verifiable aggregation in federated learning. Existing verifiable aggregation schemes either have a complexity that linearly grows with the model size or require time-consuming reconstruction at the server, that is quadratic in the number of users, in case of likely user dropouts. To overcome these limitations, we propose {\texttt{LightVeriFL}}, a lightweight and communication-efficient secure verifiable aggregation protocol, that provides the same guarantees for verifiability against a malicious server, data privacy, and dropout-resilience as the state-of-the-art protocols without incurring substantial communication and computation overheads. The proposed \texttt{LightVeriFL} protocol utilizes homomorphic hash and commitment functions of constant length, that are independent of the model size, to enable verification at the users. In case of dropouts, \texttt{LightVeriFL} uses a one-shot aggregate hash recovery of the dropped users, instead of a one-by-one recovery based on secret sharing, making the verification process significantly faster than the existing approaches. We evaluate \texttt{LightVeriFL} through experiments and show that it significantly lowers the total verification time in practical settings. Link » Baturalp Buyukates · Jinhyun So · Hessam Mahdavifar · Salman Avestimehr 🔗 Fri 8:00 a.m. - 8:30 a.m. Break 🔗 Fri 8:30 a.m. - 8:37 a.m. Efficient Federated Random Subnetwork Training (Oral)  link »    One main challenge in federated learning is the large communication cost of exchanging weight updates from clients to the server at each round. While prior work has made great progress in compressing the weight updates through gradient compression methods, we propose a radically different approach that does not update the weights. Instead, our method freezes the weights at their initial random values and learns how to sparsify the random network for the best performance. To this end, the clients collaborate in training a \emph{stochastic} binary mask to find the optimal random sparse network within the original one. At the end of the training, the final model is a randomly weighted sparse network -- or a subnetwork inside the random dense network. We show improvements in accuracy, communication bitrate (less than $1$ bit per parameter (bpp)), convergence speed, and final model size (less than $1$ bpp) over relevant baselines on MNIST, EMNIST, CIFAR-10, and CIFAR-100 datasets, in the low bitrate regime under various system configurations. Link » Francesco Pase · Berivan Isik · Deniz Gunduz · Tsachy Weissman · Michele Zorzi 🔗 Fri 8:40 a.m. - 8:47 a.m. Group privacy for personalized federated learning (Oral)  link »    Federated learning exposes the participating clients to issues of leakage of private information from the client-server communication and the lack of personalization of the global model. To address both the problems, we investigate the use of metric-based local privacy mechanisms and model personalization. These are based on operations performed directly in the parameter space, i.e. sanitization of the model parameters by the clients and clustering of model parameters by the server. Link » Filippo Galli · Sayan Biswas · Gangsoo Zeong · Tommaso Cucinotta · Catuscia Palamidessi 🔗 Fri 8:50 a.m. - 8:57 a.m. To Federate or Not To Federate: Incentivizing Client Participation in Federated Learning (Oral)  link »    Federated learning (FL) facilitates collaboration between a group of clients who seek to train a common machine learning model without directly sharing their local data. Although there is an abundance of research on improving the speed, efficiency, and accuracy of federated training, most works implicitly assume that all clients are willing to participate in the FL framework. Due to data heterogeneity, however, the global model may not work well for some clients, and they may instead choose to use their own local model. Such disincentivization of clients can be problematic from the server's perspective because having more participating clients yields a better global model, and offers better privacy guarantees to the participating clients. In this paper, we propose an algorithm called IncFL that explicitly maximizes the fraction of clients who are incentivized to use the global model by dynamically adjusting the aggregation weights assigned to their updates. Our experiments show that IncFL increases the number of incentivized clients by $30$-$55\%$ compared to standard federated training algorithms, and can also improve the generalization performance of the global model on unseen clients. Link » Yae Jee Cho · Divyansh Jhunjhunwala · Tian Li · Virginia Smith · Gauri Joshi 🔗 Fri 9:00 a.m. - 9:07 a.m. SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication (Oral)  link »    The decentralized Federated Learning (FL) setting avoids the role of a potentially unreliable or untrustworthy central host by utilizing groups of clients to collaboratively train a model via localized training and model/gradient sharing. Most existing decentralized FL algorithms require synchronization of client models where the speed of synchronization depends upon the slowest client. In this work, we propose SWIFT: a wait-free decentralized FL algorithm that allows clients to conduct training at their own speed. Theoretically, we prove that SWIFT matches the gold-standard convergence rate $\mathcal{O}(1/\sqrt{T})$ of parallel stochastic gradient descent for convex and non-convex smooth optimization (total iterations $T$). Furthermore, this is done in the IID and non-IID settings without any bounded-delay assumption for slow clients, which is required by other asynchronous decentralized FL algorithms. Although SWIFT achieves the same convergence rate with respect to $T$ as other state-of-the-art (SOTA) parallel stochastic algorithms, it converges faster with respect to time due to its wait-free structure. Our experimental results demonstrate that communication costs between clients in SWIFT fall by an order of magnitude compared to synchronous counterparts. Furthermore, SWIFT produces loss levels for image classification, over IID and non-IID data settings, upwards of 50\% faster than existing SOTA algorithms. Link » Marco Bornstein · Tahseen Rabbani · Evan Wang · Amrit Bedi · Furong Huang 🔗 Fri 9:10 a.m. - 9:15 a.m. Best Paper Announcement (Other) 🔗 Fri 9:15 a.m. - 10:00 a.m. Poster Session 1 (Poster Session) 🔗 Fri 10:00 a.m. - 11:30 a.m. Lunch Break (Break) 🔗 Fri 11:30 a.m. - 11:37 a.m. FL Games: A Federated Learning Framework for Distribution Shifts (Oral)  link »    Federated learning aims to train predictive models for data that is distributed across clients, under the orchestration of a server. However, participating clients typically each hold data from a different distribution, which can yield to catastrophic generalization on data from a different client, which represents a new domain. In this work, we argue that in order to generalize better across non-i.i.d. clients, it is imperative only to learn correlations that are stable and invariant across domains. We propose FL Games, a game-theoretic framework for federated learning that learns causal features that are invariant across clients. While training to achieve the Nash equilibrium, the traditional best response strategy suffers from high-frequency oscillations. We demonstrate that FL Games effectively resolves this challenge and exhibits smooth performance curves. Further, FL Games scales well in the number of clients, requires significantly fewer communication rounds, and is agnostic to device heterogeneity. Through empirical evaluation, we demonstrate that \flgames achieves high out-of-distribution performance on various benchmarks. Link » Sharut Gupta · Kartik Ahuja · Mohammad Havaei · Niladri Chatterjee · Yoshua Bengio 🔗 Fri 11:40 a.m. - 11:47 a.m. Verifiable Federated Machine Learning (Oral)  link »    In Federated Learning (FL) a significant body of research has focused on defending against malicious clients. However, clients are not the only party that can behave maliciously. The aggregator itself may tamper the model to bias it towards certain outputs, or adapt the weights to aid in reconstructing a client's private data. In this work we tackle the open problem of efficient verification of the computations performed by the aggregator in FL. We develop a novel protocol which through using binding commitments prevents an aggregator from modifying the resulting model, and only permits the aggregator to sum the supplied weights. We provide proof of correctness for our protocol demonstrating that any tampering by an aggregator will be detected. Additionally, we evaluate our protocol's overheads on three datasets, and show that even for large neural networks with millions of parameters the commitments can be computed in under 20 seconds. Link » Simone Bottoni · Giulio Zizzo · Stefano Braghin · Alberto Trombetta 🔗 Fri 11:50 a.m. - 11:57 a.m. Accelerated Federated Optimization with Quantization (Oral)  link »    Federated optimization is a new form of distributed training on very large datasets that leverages many devices each containing local data. While decentralized computation can lead to significant speed-ups due to parallelization, some centralization is still required: devices must aggregate their parameter updates through synchronization across the network. The potential for communication bottleneck is significant. The two main methods to tackle this issue are (a) smarter optimization that decreases the frequency of communication rounds and (b) using \emph{compression} techniques such as quantization and sparsification to reduce the number of bits machines need to transmit. In this paper, we provide a novel algorithm, \textbf{Fed}erated optimization algorithm with \textbf{A}cceleration and \textbf{Q}uantization (FedAQ), with improved theoretical guarantees by combining an accelerated method of federated averaging, reducing the number of training and synchronization steps, with an efficient quantization scheme that significantly reduces communication complexity. We show that in a homogeneous strongly convex setting, FedAQ achieves a linear speedup in the number of workers $M$ with only $\Tilde{\mathcal{O}}(M^{\frac{1}{3}})$ communication rounds, significantly smaller than what is required by other quantization-based federated optimization algorithms. Moreover, we empirically verify that our algorithm performs better than current methods. Link » Yeojoon Youn · Bhuvesh Kumar · Jacob Abernethy 🔗 Fri 12:00 p.m. - 12:07 p.m. Tackling Personalized Federated Learning with Label Concept Drift via Hierarchical Bayesian Modeling (Oral)  link »    Federated Learning (FL) is a distributed learning scheme to train a shared model across clients. One fundamental challenge in FL is that the sets of data across clients could be non-identically distributed, which is common in practice. Personalized Federated Learning (PFL) attempts to solve this challenge. Most methods in the literature of PFL focus on the data heterogeneity that clients differ in their label distributions. In this work, we focus on label concept drift which is a large but unexplored area. Firstly, we present a general framework for PFL based on hierarchical Bayesian inference. A global variable is introduced to capture the common trends of different clients and is used to augment the joint distribution of clients' parameters. Then we describe two concrete inference algorithms based on this framework. The first one finds a maximum a posteriori (MAP) solution for this augmented posterior distribution and adds little overhead compared with existing approaches. The second one further considers uncertainties of clients' parameters and different drift pattern across clients.We demonstrate our methods through extensive empirical studies on CIFAR100 and SUN397. Experimental results show our approach significantly outperforms the state of the art PFL when tackling the label concept drift across clients. Link » Xingchen Ma · Junyi Zhu · Matthew Blaschko 🔗 Fri 12:10 p.m. - 1:00 p.m. Panel 🔗 Fri 1:00 p.m. - 1:30 p.m. Break 🔗 Fri 1:30 p.m. - 1:48 p.m. On the Unreasonable Effectiveness of Federated Averaging with Heterogenous Data (Invited Talk)    Existing theory predicts that data heterogeneity will degrade the performance of the Federated Averaging (FedAvg) algorithm. However, in practice, the simple FedAvg algorithm converges very well. In this talk, we explain the seemingly unreasonable effectiveness of FedAvg that contradicts the previous theoretical predictions. We find that the key assumption of bounded gradient dissimilarity in previous theoretical analyses is too pessimistic to characterize data heterogeneity in practical applications. For a simple quadratic problem, we demonstrate there exist regimes where large gradient dissimilarity does not have any negative impact on the convergence of FedAvg. Motivated by this observation, we propose a new quantity average drift at optimum to measure the effects of data heterogeneity and explicitly use it to present a new theoretical analysis of FedAvg. We show that the average drift at optimum is nearly zero across many real-world federated training tasks, whereas the gradient dissimilarity can be large. And our new analysis suggests FedAvg can have identical convergence rates in homogeneous and heterogeneous data settings, and hence, leads to a better understanding of its empirical success. Jianyu Wang 🔗 Fri 1:48 p.m. - 1:52 p.m. On the Unreasonable Effectiveness of Federated Averaging with Heterogenous Data - Q&A (Q&A) 🔗 Fri 1:52 p.m. - 2:10 p.m. Scalable and Communication-Efficient Vertical Federated Learning (Invited Talk)    Vertical Federated Learning (VFL) algorithms are an important class of federated learning algorithms in which parties’ local datasets share a common sample ID space but have different feature sets. This is in contrast to Horizontal Federated Learning (HFL), where parties share the same feature sets but for different sample IDs. While much work has been done to advance the efficiency and flexibility of HFL, these techniques do not directly extend to VFL due to differences in the model architecture and training paradigm. In this talk, I will present two methods for efficient and robust VFL. The first, Compressed VFL, reduces communication cost through message compression while achieving the same asymptotic convergence rate as standard VFL with no compression. The second, Flex-VFL, extends VFL to support heterogeneous parties that may use different local optimizers and may operate at different rates. I will highlight some interesting theoretical and experimental results for each method, and finally, I will present some directions and open questions for future work in VFL. Stacy Patterson 🔗 Fri 2:10 p.m. - 2:14 p.m. Scalable and Communication-Efficient Vertical Federated Learning - Q&A (Q&A) 🔗 Fri 2:15 p.m. - 3:00 p.m. Poster Session 2 (Poster Session) 🔗 - FedTH : Tree-based Hierarchical Image Classification in Federated Learning (Poster)  link » In recent years, privacy threats have been rising in a flood of data. Federated learning was introduced to protect the privacy of data in machine learning. However, Internet of Things (IoT) devices accounting for a large portion of data collection still have weak computational and communication power. Moreover, cutting-edged image classification architectures have more extensive and complex models to reach high performance. In this paper, we introduce FedTH, a tree-based hierarchical image classification architecture in federated learning, to handle these problems. FedTH architecture is constructed of a tree structure to help decrease computational and communication costs, to have a flexible prediction procedure, and to have robustness in heterogeneous environments. Link » Jaeheon Kim · Bong Jun Choi 🔗 - Unbounded Gradients in Federated Leaning with Buffered Asynchronous Aggregation (Poster)  link » Synchronous updates may compromise the efficiency of cross-silo federated learning once the number of active clients increases. The FedBuff algorithm (Nguyen et al.) alleviates this problem by allowing asynchronous updates (staleness), which enhances the scalability of training while preserving privacy via secure aggregation. We revisit the FedBuff algorithm for asynchronous federated learning and extend the existing analysis by removing the boundedness assumptions from the gradient norm. This paper presents a theoretical analysis of the convergence rate of this algorithm when heterogeneity in data, batch size, and delay are considered. Link » M. Taha Toghani · Cesar Uribe 🔗 - Early Detection of Sexual Predators with Federated Learning (Poster)  link » The rise in screen time and the isolation brought by the different containment measures implemented during the COVID-19 pandemic have led to an alarming increase in cases of online grooming. Online grooming is defined as all the strategies used by predators to lure children into sexual exploitation. Previous attempts made in industry and academia on the detection of grooming rely on accessing and monitoring users’ private conversations through the training of a model centrally or by sending personal conversations to a global server. We introduce a first, privacy-preserving, cross-device, federated learning framework for the early detection of sexual predators, which aims to ensure a safe online environment for children while respecting their privacy. Link » Khaoula Chehbouni · Gilles Caporossi · Reihaneh Rabbany · Martine De Cock · Golnoosh Farnadi 🔗 - Self-Supervised Vertical Federated Learning (Poster)  link » We consider a system where parties store vertically-partitioned data with a partially overlapping sample space, and a server stores labels on a subset of data samples. Supervised Vertical Federated Learning (VFL) algorithms are limited to training models using only overlapping labeled data, which can lead to poor model performance or bias. Self-supervised learning has been shown to be effective for training on unlabeled data, but the current methods do not generalize to the vertically-partitioned setting. We propose a novel extension of self-supervised learning to VFL (SS-VFL), where unlabeled data is used to train representation networks and labeled data is used to train a downstream prediction network. We present two SS-VFL algorithms: SS-VFL-I is a two-phase algorithm which requires only one round of communication, while SS-VFL-C adds communication rounds to improve model generalization. We show that both SS-VFL algorithms can achieve up to $2\times$ higher accuracy than supervised VFL when labeled data is scarce at a significantly reduced communication cost. Link » Timothy Castiglia · Shiqiang Wang · Stacy Patterson 🔗 - On the Vulnerability of Backdoor Defenses for Federated Learning (Poster)  link » Federated learning (FL) is a popular distributed machine learning paradigm which enables jointly training a global model without sharing clients' data. However, its repetitive server-client communication gives room for possible backdoor attacks which aims to mislead the global model into a targeted misprediction when a specific trigger pattern is presented. In response to such backdoor threats on federated learning, various defense measures have been proposed. In this paper, we study whether the current defense mechanisms truly neutralize the backdoor threats from federated learning in a practical setting by proposing a new federated backdoor attack framework for possible countermeasures. Different from traditional training (on triggered data) and rescaling (the malicious client model) based backdoor injection, the proposed backdoor attack framework (1) directly modifies (a small proportion of) local model weights to inject the backdoor trigger via sign flips; (2) jointly optimize the trigger pattern with the client model, thus is more persistent and stealthy for circumventing existing defenses. In a case study, we examine the strength and weaknesses of several recent federated backdoor defenses from three major categories and provide suggestions to the practitioners when training federated models in practice. Link » Pei Fang · Jinghui Chen 🔗 - Towards Provably Personalized Federated Learning via Threshold-Clustering of Similar Clients (Poster)  link » Clustering clients with similar objectives together and learning a model per cluster is an intuitive and interpretable approach to personalization in federated learning (PFL). However, doing so with provable and optimal guarantees has remained an open challenge. In this work, we formalize personalized federated learning as a stochastic optimization problem where the stochastic gradients on a client may correspond to one of $K$ distributions. In such a setting, we show that using i) a simple thresholding based clustering algorithm, and ii) local client momentum obtains optimal convergence guarantees. In fact, our rates asymptotically match those obtained if we knew the true underlying clustering of the clients. Further, we extend our algorithm to the decentralized setting where each node performs clustering using itself as the center. Link » Mariel A Werner · Lie He · Sai Praneeth Karimireddy · Michael Jordan · Martin Jaggi 🔗 - Building Large Machine Learning Models from Small Distributed Models: A Layer Matching Approach (Poster)  link » Cross-device federated learning (FL) enables a massive amount of clients to collaborate to train a machine learning model with local data. However, the computational resource of the client devices restricts FL from utilizing large modern machine learning models that requires sufficient computation. In this paper, we propose a federated layer matching algorithm that enables the server to build a deep server machine learning model from relatively shallow client models. The federated layer matching (FLM) algorithm dynamically averages similar layers in the client models to the server model, and inserts dissimilar layers as new layers to the server model. With the proposed algorithm, the clients are able to train small models based on device capacity, while the server can still obtain a larger and more powerful server model from the clients with decentralized data. Our numerical experiments show that the proposed FLM algorithm is able to build a server model $40\%$ larger than the client models, and such a model performs much better than the model obtained by the classical FedAvg, when using the same amount of communication resource. Link » xinwei zhang · Bingqing Song · Mehrdad Honarkhah · Jie Ding · Mingyi Hong 🔗 - VOTING-BASED APPROACHES FOR DIFFERENTIALLY PRIVATE FEDERATED LEARNING (Poster)  link » Differentially Private Federated Learning (DPFL) is an emerging field with many applications. Gradient averaging based DPFL methods require costly communication rounds and hardly work with large-capacity models, due to the explicit dimension dependence in its added noise. In this paper, inspired by the non-federated knowledge transfer privacy learning methods, we design two DPFL algorithms (AE-DPFL and kNN-DPFL) that provide provable DP guarantees for both instance-level and agent-level privacy regimes. By voting among the data labels returned from each local model, instead of averaging the gradients, our algorithms avoid the dimension dependence and significantly reduces the communication cost. Theoretically, by applying secure multi-party computation, we could exponentially amplify the (data-dependent) privacy guarantees when the margin of the voting scores are distinctive. Empirical evaluation on both instance and agent level DP is conducted across five datasets, showing 2% to 12% higher accuracy when privacy cost is the same compared to DP-FedAvg, or less than $65\%$ privacy cost when accuracy aligns the same. Link » Yuqing Zhu · Xiang Yu · Yi-Hsuan Tsai · Francesco Pittaluga · Masoud Faraki · Manmohan Chandraker · Yu-Xiang Wang 🔗 - DASH: Decentralized CASH for Federated Learning (Poster)  link » We present DASH, a decentralized framework that addresses for the first time the Combined Algorithm Selection and HyperParameter Optimization (CASH) problem in Federated Learning (FL) settings. DASH generates a set of algorithm-hyper-parameter (Alg-HP) pairs using existing centralized HPO algorithms which are then evaluated by clients individually on their local datasets. The clients transmit to the server the loss functions and the server aggregates them in order to generate a loss signal that will aid the next Alg-HP pair selection. This approach avoids the communication complexity of performing client evaluations using communication-intensive FL training. FL training is only performed when the final Alg-HP pair is selected. Thus, DASH allows the use of sophisticated HPO algorithms at the FL server, while requiring clients to perform simpler model training and evaluation on their individual datasets than communication-intensive FL training. We provide a theoretical analysis of the loss rate attained by DASH as compared to a fully centralized solution (with access to all client datasets), and show that regret depends on the dissimilarity between the datasets of the clients, resulting from the FL restriction that client datasets remain private. Experimental studies on several datasets show that DASH performs favorably against several baselines and closely approximates centralized CASH performance. Link » Md Ibrahim Ibne Alam · Koushik Kar · Theodoros Salonidis · Horst Samulowitz 🔗 - Accelerating Adaptive Federated Optimization with Local Gossip Communications (Poster)  link » Recently, adaptive federated optimization methods, such as FedAdam and FedAMSGrad, have gained increasing attention for their fast convergence and stable performance especially in training models with heavy-tail stochastic gradient distributions. However, the implementation of such methods still faces several bottlenecks, such as the large client-to-server communication overhead and the intense sensitivity to heterogeneous data. More importantly, the two objectives may conflict with each other, i.e., the convergence rate gets worse as the number of local steps increases in the partial participation setting, making it challenging to further improve the efficiency of adaptive federated optimization. We refer this problem as the \textit{dilemma of local steps}. In this paper, we propose a novel hybrid adaptive federated optimization method (HA-Fed) where the clients are partitioned into disjoint clusters inside which they can communicate by fast client-to-client links. We show that HA-Fed resolves the \textit{dilemma of local steps} in prior adaptive federated optimization methods, i.e., achieves a faster convergence rate as the local steps increases, while reducing the client-to-server communication overhead under non-i.i.d. settings. Specifically, HA-Fed improves the convergence rate from $\mathcal{O}(\sqrt{\tau}/\sqrt{TM})$ in FedAMSGrad to $\mathcal{O}(1/\sqrt{T\tau M})$ in partial participation scenarios under nonconvex stochastic setting. Extensive experiments and ablation studies demonstrate the effectiveness and broad applicability of our proposed method. Link » Yujia Wang · Pei Fang · Jinghui Chen 🔗 - Federated Progressive Sparsification (Purge-Merge-Tune)+ (Poster)  link » We present FedSparsify, a sparsification strategy for federated training based on progressive weight magnitude pruning, which provides several benefits. First, since the size of the network becomes increasingly smaller, computation and communication costs during training are reduced. Second, the models are incrementally constrained to a smaller set of parameters, which facilitates alignment/merging of the local models, and results in improved learning performance at high sparsity. Third, the final sparsified model is significantly smaller, which improves inference efficiency. We analyze FedSparsify's convergence and empirically demonstrate that FedSparsify can learn a subnetwork smaller than a tenth of the size of the original model with the same or better accuracy compared to existing pruning and no-pruning baselines across several challenging federated learning environments. Our approach leads to an average 4-fold inference efficiency speedup and a 15-fold model size reduction over different domains and neural network architectures. Link » Dimitris Stripelis · Umang Gupta · Greg Ver Steeg · Jose-Luis Ambite 🔗 - A Multi-Token Coordinate Descent Method for Vertical Federated Learning (Poster)  link » Communication efficiency is a major challenge in federated learning. In client-server schemes, the server constitutes a bottleneck, and while decentralized setups spread communications, they do not reduce them. We propose a communication efficient semi-decentralized federated learning algorithm for feature-distributed data. Our multi-token method can be seen as a parallel Markov chain (block) coordinate descent algorithm. In this work, we formalize the multi-token semi-decentralized scheme, which subsumes the client-server and decentralized setups, and design a feature-distributed learning algorithm for this setup. Numerical results show the improved communication efficiency of our algorithm. Link » Pedro Valdeira · Yuejie Chi · Claudia Soares · Joao Xavier 🔗 - ColRel: Collaborative Relaying for Federated Learning over Intermittently Connected Networks (Poster)  link » Intermittent connectivity of clients to the parameter server (PS) is a major bottleneck in federated edge learning. It induces a large generalization gap, especially when the local data distribution amongst clients exhibits heterogeneity. To overcome communication blockages between clients and the central PS, we introduce the concept of collaborative relaying (ColRel) wherein the participating clients relay their neighbors' local updates to the PS in order to boost the participation of clients with poor connectivity to the PS. For every communication round, each client initially computes a local consensus of a subset of its neighboring clients' updates and subsequently transmits to the PS, a weighted average of its own update and those of its neighbors'. We optimize these weights to ensure that the global update at the PS is unbiased with minimal variance -- consequently improving the convergence rate. Numerical evaluations on the CIFAR-10 dataset demonstrate that our ColRel-based approach achieves a higher test accuracy over Federated Averaging based benchmarks for learning over intermittently-connected networks. Link » Rajarshi Saha · Michal Yemini · Emre Ozfatura · Deniz Gunduz · Andrea Goldsmith 🔗 - Understanding Federated Learning through Loss Landscape Visualizations: A Pilot Study (Poster)  link » Federated learning aims to train a machine learning model (e.g., a neural network) in a data-decentralized fashion. The key challenge is the potential data heterogeneity among clients. When clients' data are non-IID, federatedly learned models could hardly achieve the same performance as centralizedly learned models. In this paper, we conduct the very first, pilot study to understand the challenge of federated learning through the lens of loss landscapes. We extend the visualization methods developed to uncover the training trajectory of centralized learning to federated learning, and explore the effect of data heterogeneity on model training. Through our approach, we can clearly visualize the phenomenon of model drifting: the more the data heterogeneity is, the larger the model drifting is. We further explore how model initialization affects the loss landscape, and how clients' participation affects the model training trajectory. We expect our approach to serve as a new, qualitative way to analyze federated learning. Link » Ziwei Li · Hong-You Chen · Han Wei Shen · Wei-Lun Chao 🔗 - Differentially Private Federated Quantiles with the Distributed Discrete Gaussian Mechanism (Poster)  link » The computation of analytics in a federated environment plays an increasingly important role in data science and machine learning. We consider the differentially private computation of the quantiles of a distribution of values stored on a population of clients. We present two quantile estimation algorithms based on the distributed discrete Gaussian mechanism compatible with secure aggregation. Based on a privacy-utility analysis and numerical experiments, we delineate the regime under which each one is superior. We find that the algorithm with suboptimal asymptotic performance works the best on moderate problem sizes typical in federated learning with client sampling. We apply these algorithms to augment distributionally robust federated learning with differential privacy. Link » Krishna Pillutla · Yassine Laguel · Jérôme Malick · Zaid Harchaoui 🔗 - Efficient and Light-Weight Federated Learning via Asynchronous Distributed Dropout (Poster)  link » We focus on dropout techniques for asynchronous distributed computations in federated learning (FL) scenarios. We propose \texttt{AsyncDrop}, a novel asynchronous FL framework with smart (i.e., informed/structured) dropout that achieves better performance compared to state of the art asynchronous methodologies, while resulting in less communication and training time costs. The key idea revolves around sub-models out of the global model, that take into account the device heterogeneity. We conjecture that such an approach can be theoretically justified. We implement our approach and compare it against other asynchronous baseline methods, by adapting current synchronous FL algorithms to asynchronous scenarios. Empirically, \texttt{AsyncDrop} significantly reduces the communication cost and training time, while improving the final test accuracy in non-i.i.d. scenarios. Link » Chen Dun · Mirian Hipolito Garcia · Dimitrios Dimitriadis · Christopher Jermaine · Anastasios Kyrillidis 🔗 - FedGRec: Federated Graph Recommender System with Lazy Update of Latent Embeddings (Poster)  link » Recommender systems are widely used in industry to improve user experience. Despite great success, they have recently been criticized for collecting private user data. Federated Learning (FL) is a new paradigm for learning on distributed data without direct data sharing. Therefore, Federated Recommender (FedRec) systems are proposed to mitigate privacy concerns to non-distributed recommender systems. However, FedRec systems have a performance gap to its non-distributed counterpart. The main reason is that local clients have an incomplete user-item interaction graph, thus FedRec systems cannot utilize indirect user-item interactions well. In this paper, we propose the Federated Graph Recommender System (FedGRec) to mitigate this gap. Our FedGRec system can effectively exploit the indirect user-item interactions. More precisely, in our system, users and the server explicitly store latent embeddings for users and items, where the latent embeddings summarize different orders of indirect user-item interactions and are used as a proxy of missing interaction graph during local training. We perform extensive empirical evaluations to verify the efficacy of using latent embeddings as a proxy of missing interaction graph; the experimental results show superior performance of our system compared to various baselines. Link » Junyi Li · Heng Huang 🔗 - Privacy-Preserving Data Filtering in Federated Learning Using Influence Approximation (Poster)  link » Federated Learning by nature is susceptible to low-quality, corrupted, or even malicious data that can severely degrade the quality of the learned model. Traditional techniques for data valuation cannot be applied as the data is never revealed. We present a novel technique for filtering, and scoring data based on a practical influence approximation (lazy' influence) that can be implemented in a privacy-preserving manner. Each agent uses his own data to evaluate the influence of another agent's batch, and reports to the center an obfuscated score using differential privacy. Our technique allows for highly effective filtering of corrupted data in a variety of applications. Importantly, the accuracy does not degrade significantly, even under really strong privacy guarantees ($\varepsilon \leq 1$), especially under realistic percentages of mislabeled data. Link » Ljubomir Rokvic · Panayiotis Danassis · Boi Faltings 🔗 - Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning (Poster)  link » An oft-cited challenge of federated learning is the presence of heterogeneity. \emph{Data heterogeneity} refers to the fact that data from different clients may follow very different distributions. \emph{System heterogeneity} refers to the fact that client devices have different system capabilities. A considerable number of federated optimization methods address this challenge. In the literature, empirical evaluations usually start federated training from random initialization. However, in many practical applications of federated learning, the server has access to proxy data for the training task that can be used to pre-train a model before starting federated training. We empirically study the impact of starting from a pre-trained model in federated learning using four standard federated learning benchmark datasets. Unsurprisingly, starting from a pre-trained model reduces the training time required to reach a target error rate and enables the training of more accurate models (up to 40\%) than is possible when starting from random initialization. Surprisingly, we also find that starting federated learning from a pre-trained initialization reduces the effect of both data and system heterogeneity. We recommend that future work proposing and evaluating federated optimization methods evaluate the performance when starting from random and pre-trained initializations. We also believe this study raises several questions for further work on understanding the role of heterogeneity in federated optimization. Link » John Nguyen · Jianyu Wang · Kshitiz Malik · Maziar Sanjabi · Mike Rabbat 🔗 - Trusted Aggregation (TAG): Model Filtering Backdoor Defense In Federated Learning (Poster)  link » Federated Learning is a framework for training machine learning models from multiple local data sets without access to the data. A shared model is jointly learned through an interactive process between server and clients that combines locally learned model gradients or weights. However, the lack of data transparency naturally raises concerns about model security. Recently, several state-of-the-art backdoor attacks have been proposed, which achieve high attack success rates while simultaneously being difficult to detect, leading to compromised federated learning models. In this paper, motivated by differences in the output layer distribution between models trained with and without the presence of backdoor attacks, we propose a defense method that can prevent backdoor attacks from influencing the model while maintaining the accuracy of the original classification task. Link » Joseph Lavond · Minhao Cheng · Yao Li 🔗 - Cross-device Federated Architecture Search (Poster)  link » Federated learning (FL) has recently gained considerable attention due to its ability to learn on decentralised data while preserving client privacy. However, it also poses additional challenges related to the heterogeneity of the participating devices, both in terms of their computational capabilities and contributed data. Meanwhile, Neural Architecture Search (NAS) has been successfully used with centralised datasets, producing state-of-the-art results in constrained or unconstrained settings. However, such centralised datasets may not be always available. Most recent work at the intersection of NAS and FL attempts to alleviate this issue in a cross-silo federated setting, which assumes homogeneous compute environment with datacenter-grade hardware. In this paper we explore the question of whether we can design architectures of different footprints in a cross-device federated setting, where the device landscape, availability and scale are very different.To this end, we design our system, FedorAS, to discover and train promising architectures in a resource-aware manner when dealing with devices of varying capabilities holding non-IID distributed data. We present empirical evidence of its effectiveness across different settings, spanning across three different modalities (vision, speech, text), and showcase its better performance compared to state-of-the-art federated solutions, while maintaining resource efficiency. Link » Stefanos Laskaridis · Javier Fernandez-Marques · Łukasz Dudziak 🔗 - Client-Private Secure Aggregation for Privacy-Preserving Federated Learning (Poster)  link » Privacy-preserving federated learning (PPFL) is a paradigm of distributed privacy-preserving machine learning training in which a set of clients jointly compute a shared global model under the orchestration of an aggregation server. The system has the property that no party learns any information about any client's training data, besides what could be inferred from the global model. The core cryptographic component of a PPFL scheme is the secure aggregation protocol, a secure multi-party computation protocol in which the server securely aggregates the clients' locally trained models, and sends the aggregated model to the clients. However, in many applications the global model represents a trade secret of the consortium of clients, which they may not wish to reveal in the clear to the server. In this work, we propose a novel model of secure aggregation, called client-private secure aggregation, in which the server computes an encrypted global model that only the clients can decrypt. We provide an explicit construction of a client-private secure aggregation protocol, as well as a theoretical and empirical evaluation of our construction to demonstrate its practicality. Our experiments demonstrate that the client and server running time of our protocol are less than 19 s and 2 s, respectively, when scaled to support 250 clients. Link » Parker Newton · Olivia Choudhury · Bill Horne · Vidya Ravipati · Divya Bhargavi · Ujjwal Ratan 🔗 - Motley: Benchmarking Heterogeneity and Personalization in Federated Learning (Poster)  link » Personalized federated learning considers learning models unique to each client in a heterogeneous network. The resulting client-specific models have been purported to improve metrics such as accuracy, fairness, and robustness in federated networks. However, despite a plethora of work in this area, it remains unclear: (1) which personalization techniques are most effective in various settings, and (2) how important personalization truly is for realistic federated applications. To better answer these questions, we propose Motley, a benchmark for personalized federated learning. Motley consists of a suite of cross-device and cross-silo federated datasets from varied problem domains, as well as thorough evaluation metrics for better understanding the possible impacts of personalization. We establish baselines on the benchmark by comparing a number of representative personalized federated learning methods. These initial results highlight strengths and weaknesses of existing approaches, and raise several open questions for the community. Motley aims to provide a reproducible means with which to advance developments in personalized and heterogeneity-aware federated learning, as well as the related areas of transfer learning, meta-learning, and multi-task learning. Link » Shanshan Wu · Tian Li · Zachary Charles · Yu Xiao · Ken Liu · Zheng Xu · Virginia Smith 🔗 - Federated Learning for Predicting the Next Node in Action Flows (Poster)  link » Federated learning is a machine learning approach that allows different clients to collaboratively train a common model without sharing their data sets. Since clients have different data and classify data differently, there is a trade-off between the generality of the common model and the personalization of the classification results. Current approaches rely on using a combination of a global model, common to all clients, and multiple local models, that support personalization. In this paper, we report the results of a study, where we have applied some of these approaches to a concrete use case, namely the Anonymous platform from Anonymous Company, where Graph Neural Networks help programmers in the development of applications. Our results show that the amount of data points of each client affects the personalization strategy and that there is no optimal strategy that fits all clients. Link » Daniel Lopes · João Nadkarni · Filipe Assunção · Miguel Lopes · Luís Rodrigues 🔗 - The Interpolated MVU Mechanism For Communication-efficient Private Federated Learning (Poster)  link » We consider private federated learning (FL), where a server aggregates differentially private gradient updates from a large number of clients in order to train a machine learning model. The main challenge here is balancing privacy with both classification accuracy of the learned model as well as the amount of communication between the clients and server. In this work, we build on a recently proposed method for communication-efficient private FL---the MVU mechanism---by introducing a new interpolation mechanism that can accommodate a more efficient privacy analysis. The result is the new Interpolated MVU mechanism that provides SOTA results on communication-efficient private FL on a variety of datasets. Link » Chuan Guo · Kamalika Chaudhuri · Pierre STOCK · Mike Rabbat 🔗 - Find Your Friends: Personalized Federated Learning with the Right Collaborators (Poster)  link » In the traditional federated learning setting, a central server coordinates a network of clients to train one global model. However, the global model may serve many clients poorly due to data heterogeneity. Moreover, there may not exist a trusted central party that can coordinate the clients to ensure that each of them can benefit from others. To address these concerns, we present a novel decentralized framework, FedeRiCo, where each client can learn as much or as little from other clients as is optimal for its local data distribution. Based on expectation-maximization, FedeRiCo estimates the utilities of other participants’ models on each client’s data so that everyone can select the right collaborators for learning. As a result, our algorithm outperforms other federated, personalized, and/or decentralized approaches on several benchmark datasets, being the only approach that consistently performs better than training with local data only. Link » Yi Sui · Junfeng Wen · Yenson Lau · Brendan Ross · Jesse Cresswell 🔗 - Rethinking Data Heterogeneity in Federated Learning: Introducing a New Notion and Standard Benchmarks (Poster)  link » Though successful, federated learning presents new challenges for machine learning, especially when the issue of data heterogeneity, also known as Non-IID data, arises. To cope with the statistical heterogeneity, previous works incorporated a proximal term in local optimization or modified the model aggregation scheme at the server side or advocated clustered federated learning approaches where the central server groups agent population into clusters with jointly trainable data distributions to take the advantage of a certain level of personalization. While effective, they lack a deep elaboration on what kind of data heterogeneity and how the data heterogeneity impacts the accuracy performance of the participating clients. In contrast to many of the prior federated learning approaches, we demonstrate not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants. Our observations are intuitive: (1) Dissimilar labels of clients (label skew) are not necessarily considered data heterogeneity, and (2) the principal angle between the agents' data subspaces spanned by their corresponding principal vectors of data is a better estimate of the data heterogeneity. Link » Saeed Vahidian · Mahdi Morafah · Chen Chen · Mubarak Shah · Bill Lin 🔗 - Federated Fairness without Access to Demographics (Poster)  link » Existing federated learning approaches address demographic group fairness assuming that clients are aware of the sensitive groups. Such approaches are not applicable in settings where sensitive groups are unidentified or unavailable. In this paper, we address this limitation by focusing on federated learning settings of fairness without demographics. We present a novel objective that allows trade-offs between (worst-case) group fairness and average utility performance through a hyper-parameter and a group size constraint. We show that the proposed objective recovers existing approaches as special cases and then provide an algorithm to efficiently solve the proposed optimization problem. We experimentally showcase the different solutions that can be achieved by our proposed approach and compare it against baselines on various standard datasets. Link » Afroditi Papadaki · Natalia Martinez · Martin Bertran · Guillermo Sapiro · Miguel Rodrigues 🔗 - FLUTE: A Scalable, Extensible Framework for High-Performance Federated Learning Simulations (Poster)  link » In this paper we introduce "Federated Learning Utilities and Tools for Experimentation'' (FLUTE), a high-performance open source platform for federated learning research and offline simulations. The goal of FLUTE is to enable rapid prototyping and simulation of new federated learning algorithms at scale, including novel optimization, privacy, and communications strategies. We describe the architecture of FLUTE, enabling arbitrary federated modeling schemes to be realized, we compare the platform with other state-of-the-art platforms, and we describe available features of FLUTE for experimentation in core areas of active research, such as optimization, privacy, and scalability. A comparison with other established platforms shows speed-ups up to 42x and savings in memory footprint of 3x. A sample of the platform capabilities is presented in the Appendix for a range of tasks and other functionality such as scaling and a variety of federated optimizers. Link » Mirian Hipolito Garcia · Andre Manoel · Daniel Madrigal · Robert Sim · Dimitrios Dimitriadis 🔗 - Decentralized Learning with Random Walks and Communication-Efficient Adaptive Optimization (Poster)  link » We tackle the problem of federated learning (FL) in a peer-to-peer fashion without a central server. While prior work mainly considered gossip-style protocols for learning, our solution is based on random walks. This allows to communicate only to a single peer at a time, thereby reducing the total communication and enabling asynchronous execution. To improve convergence and reduce the need for extensive tuning, we consider an adaptive optimization method -- Adam. Two extensions reduce its communication costs: state compression and multiple local updates on each client. We theoretically analyse the convergence behaviour of the proposed algorithm and its modifications in the non-convex setting. We show that our method can achieve performance comparable to centralized FL without communication overhead. Empirical results are reported on a variety of tasks (vision, text), neural network architectures and large-scale federations (up to $\sim342$k clients). Link » Aleksei Triastcyn · Matthias Reisser · Christos Louizos 🔗 - Accelerating Federated Learning Through Attention on Local Model Updates (Poster)  link » Federated learning is used widely for privacy-preserving training. It performs well if the client datasets are both balanced and IID. However, in real-world settings, client datasets are non-IID and imbalanced. They may also experience significant distribution shifts. These non-idealities can hinder the performance of federated learning. To address this challenge, the paper devises an attention-based mechanism that learns to attend to different clients in the context of a reference dataset. The reference dataset is a test dataset in the central server which is used to monitor the performance metric of the model under training. The innovation is that the attention mechanism captures the similarities and patterns of a batch of clients' model drifts (received by the central server in each communication round) in a low dimensional latent space, similar to the way it captures the mutual relation of a batch of words (a sentence). To learn this attention layer, we devise an autoencoder whose input/outputs are the model drifts and its bottleneck is the attention mechanism. The attention weights in the bottleneck are learned by utilizing the attention-based autoencoder as a network to reconstruct the model drift on reference dataset, from the batch of received model drifts from clients in each communication round. The learned attention weights effectively capture clusters and similarities amongst the clients’ datasets. The empirical studies with MNIST, FashionMNIST, and CIFAR10 under a non-IID federated learning setup show that our attention-based autoencoder can identify the cluster of similar clients. Then the central server can use the clustering results to devise a better policy for choosing participants clients in each communication round, thereby reducing the communication rounds by up to 75% on MNIST and FashionMNIST, and 45% on CIFAR10 compared to FedAvg. Link » Parsa Assadi · Byung Hoon Ahn · Hadi Esmaeilzadeh 🔗 - Federated Frank-Wolfe Algorithm (Poster)  link » Federated learning (FL) has gained much attention in recent years for building privacy-preserving collaborative learning systems. However, FL algorithms for constrained machine learning problems are still very limited, particularly when the projection step is costly. To this end, we propose a Federated Frank-Wolfe Algorithm (FedFW). FedFW provably finds an $\varepsilon$-suboptimal solution of the constrained empirical risk-minimization problem after $\mathcal{O}(\varepsilon^{-2})$ iterations if the objective function is convex. The rate becomes $\mathcal{O}(\varepsilon^{-3})$ if the objective is non-convex. The method enjoys data privacy, low per-iteration cost and communication of sparse signals. We demonstrate empirical performance of the FedFW algorithm on several machine learning tasks. Link » Ali Dadras · Karthik Prakhya · Alp Yurtsever 🔗 - How to Combine Variational Bayesian Networks in Federated Learning (Poster)  link » Federated Learning enables multiple data centers to train a central model collaboratively without exposing any confidential data. Even though deterministic models are capable of performing high prediction accuracy, their lack of calibration and capability to quantify uncertainty is problematic for safety-critical applications. Different from deterministic models, probabilistic models such as Bayesian neural networks are relatively well-calibrated and able to quantify uncertainty alongside their competitive prediction accuracy. Both of the approaches appear in the federated learning framework; however, the aggregation scheme of deterministic models cannot be directly applied to probabilistic models since weights correspond to distributions instead of point estimates. In this work, we study the effects of various aggregation schemes for variational Bayesian neural networks. With empirical results on three image classification datasets, we observe that the degree of spread for an aggregated distribution is a significant factor in the learning process. Hence, we present an \textit{survey} on the question of how to combine variational Bayesian networks in federated learning, while providing computer vision classification benchmarks for different aggregation settings. Link » Atahan Özer · Kadir Burak Buldu · Abdullah Akgül · Gozde Unal 🔗 - Stochastic Gradient Methods with Compressed Communication for Decentralized Saddle Point Problems (Poster)  link » We develop two compression based stochastic gradient algorithms to solve a class of non-smooth strongly convex-strongly concave saddle-point problems in a decentralized setting (without a central server). Our first algorithm is a Restart-based Decentralized Proximal Stochastic Gradient method with Compression (C-RDPSG) for general stochastic settings. We provide rigorous theoretical guarantees of C-RDPSG with gradient computation complexity and communication complexity of order $\mathcal{O}( (1+\delta)^4 \frac{1}{L^2}{\kappa_f^2}\kappa_g^2 \frac{1}{\epsilon} )$, to achieve an $\epsilon$-accurate saddle-point solution, where $\delta$ denotes the compression factor, $\kappa_f$ and $\kappa_g$ denote respectively the condition numbers of objective function and communication graph, and $L$ denotes the smoothness parameter of the smooth part of the objective function. Next, we present a Decentralized Proximal Stochastic Variance Reduced Gradient algorithm with Compression (C-DPSVRG) for finite sum setting which exhibits gradient computation complexity and communication complexity of order $\mathcal{O} \left((1+\delta) \max \{\kappa_f^2, \sqrt{\delta}\kappa^2_f\kappa_g,\kappa_g \} \log\left(\frac{1}{\epsilon}\right) \right)$. Extensive numerical experiments show competitive performance of the proposed algorithms and provide support to the theoretical results obtained. Link » Chhavi Sharma · Vishnu Narayanan · Balamurugan Palaniappan 🔗 - Refined Convergence and Topology Learning for Decentralized Optimization with Heterogeneous Data (Poster)  link » One of the key challenges in decentralized and federated learning is to design algorithms that efficiently deal with highly heterogeneous data distributions across agents. In this paper, we revisit the analysis of Decentralized Stochastic Gradient Descent algorithm (D-SGD) under data heterogeneity. We first exhibit the key role played by a new quantity, called neighborhood heterogeneity, on the convergence rate of D-SGD. Neighborhood heterogeneity provides a natural criterion to learn data-dependent and sparse topologies that reduce the detrimental effect of data heterogeneity on the convergence of D-SGD. For the important case of classification with label skew, we formulate the problem of learning a topology as a tractable optimization problem that we solve with a Frank-Wolfe algorithm. As illustrated over a set of experiments, the learned sparse topology is showed to balance the convergence speed and the per-iteration communication costs of D-SGD. Link » Batiste Le bars · Aurélien Bellet · Marc Tommasi · Erick Lavoie · Anne-marie Kermarrec 🔗 - AIMHI: Protecting Sensitive Data through Federated Co-Training (Poster)  link » Federated learning offers collaborative training among distributed sites without sharing sensitive local information by sharing the sites' model parameters. It is possible, though, to make non-trivial inferences about sensitive local information from these model parameters. We propose a novel co-training technique called AIMHI that uses a public unlabeled dataset to exchange information between sites by sharing predictions on that dataset. This setting is particularly suitable to healthcare, where hospitals and clinics hold small labeled datasets with highly sensitive patient data and large national health databases contain large amounts of public patient data. We show that the proposed method reaches a model quality comparable to federated learning while maintaining privacy to high degree. Link » Amr Abourayya · Michael Kamp · Erman Ayday · Jens Kleesiek · Kanishka Rao · Geoffrey Webb · Bharat Rao 🔗 - A Novel Model-Based Attribute Inference Attack in Federated Learning (Poster)  link » In federated learning, clients such as mobile devices or data silos (e.g. hospitals and banks) collaboratively improve a shared model, while maintaining their data locally. Multiple recent works show that client’s private information can still be disclosed to an adversary who just eavesdrops the messages exchanged between the targeted client and the server. In this paper, we propose a novel model-based attribute inference attack in federated learning which overcomes the limits of gradient-based ones. Furthermore, we provide an analytical lower-bound for the success of this attack. Empirical results using real world datasets confirm that our attribute inference attack works well for both regression and classification tasks. Moreover, we benchmark our novel attribute inference attack against the state-of-the-art attacks in federated learning. Our attack results in higher reconstruction accuracy especially when the clients’ datasets are heterogeneous (as is common in federated learning). Link » ilias driouich · CHUAN XU · Giovanni Neglia · Frederic Giroire · Eoin Thomas 🔗 - FedSHIBU: Federated Similarity-based Head Independent Body Update (Poster)  link » Most federated learning algorithms like FedAVG aggregate client models to obtain a global model. However, this leads to loss of information, especially when the data distribution is highly heterogeneous across clients. As a motivation for this paper, we first show that data-specific global models (where the clients are grouped based on their data distribution) produce higher accuracy over FedAVG. This suggests a potential performance improvement if clients trained on similar data have a higher importance in model aggregation. We use data representations from extractors of client models to quantify data similarity. We propose using a weighted aggregation of client models where the weight is calculated based on the similarity of client data. Similar to FedBABU, the proposed client representation similarity-based aggregation is applied only on extractors. We empirically show that the proposed method enhances global model performance in heterogeneous data distributions. Link » Athul Sreemathy Raj · Irene Tenison · Kacem Khaled · Felipe de Magalhães · Athul Sreemathy Raj 🔗 - Revisiting the Activation Function for Federated Image Classification (Poster)  link » Federated learning (FL) has become one of the most popular distributed machine learning paradigms; these paradigms enable training on a large corpus of decentralized data that resides on devices. The recent evolution in FL research is mainly credited to the refinements in training procedures by developing the optimization methods. However, there has been little verification of other technical improvements, especially improvements to the activation functions (e.g., ReLU), that are widely used in the conventional centralized approach (i.e., standard data-centric optimization). In this work, we verify the effectiveness of activation functions in various federated settings.We empirically observe that off-the-shelf activation functions that are used in centralized settings exhibit a totally different performance trend than do federated settings. The experimental results demonstrate that HardTanh achieves the best accuracy when severe data heterogeneity or low participation rate is present. We provide a thorough analysis to investigate why the representation powers of activation functions are changed in a federated setting by measuring the similarities in terms of weight parameters and representations. Lastly, we deliver guidelines for selecting activation functions in both a cross-silo setting (i.e., a number of clients <= 20) and a cross-device setting (i.e., a number of clients >= 100). We believe that our work provides benchmark data and intriguing insights for designing models FL models. Link » Jaewoo Shin · Taehyeon Kim · Se-Young Yun 🔗 - Adaptive Sparse Federated Learning in Large Output Spaces via Hashing (Poster)  link » This paper focuses on the on-device training efficiency of federated learning (FL), and demonstrates it is feasible to exploit sparsity in the client to save both computation and memory for deep neural networks with large output space. To this end, we propose a sparse FL scheme using hash-based adaptive sampling algorithm. In this scheme, the server maintains neurons in hash tables. Each client looks up a subset of neurons from the hash table in the server and performs training. With the locality-sensitive hash functions, this scheme could provide valuable negative class neurons with respect to the client data. Moreover, the cheap operations in hashing incur low computation overhead in the sampling. In our empirical evaluation, we show that our approach can save up to $70\%$ on-device computation and memory during FL while maintaining the same accuracy. Moreover, we demonstrate that we could use the savings in the output layer to increase the model capacity and obtain better accuracy with a fixed hardware budget. Link » Zhaozhuo Xu · Luyang Liu · Zheng Xu · Anshumali Shrivastava 🔗 - FLARE: Federated Learning from Simulation to Real-World (Poster)  link » Federated learning (FL) enables building robust and generalizable AI models by leveraging diverse datasets from multiple collaborators without centralizing the data. We created FLARE as an open-source software development kit (SDK) to make it easier for data scientists to use FL in their research and real-world applications. The SDK includes solutions for state-of-the-art FL algorithms and federated machine learning approaches, which facilitate building workflows for distributed learning across enterprises and enable platform developers to create a secure, privacy-preserving offering for multiparty collaboration utilizing homomorphic encryption or differential privacy. The SDK is a lightweight, flexible, and scalable Python package, and allows researchers to bring their data science workflows implemented in any training libraries (PyTorch, TensorFlow, XGBoost, or even NumPy) and apply them in real-world FL settings. This paper introduces the key design principles of FLARE and illustrates some use cases (e.g., COVID analysis) with customizable FL workflows that implement different privacy-preserving algorithms.(Code is available at https://anonymous.4open.science/r/anon-flare.) Link » Holger Roth · Yan Cheng · Yuhong Wen · Te-Chung (Isaac) Yang · Ziyue Xu · Yuan-Ting Hsieh · Kristopher Kersten · Ahmed Harouni · Can Zhao · Kevin Lu · Zhihong Zhang · Wenqi Li · Andriy Myronenko · Dong Yang · Sean Yang · Nicola Rieke · Abood Quraini · Chester Chen · Daguang Xu · Nic Ma · Prerna Dogra · Mona Flores · Andrew Feng 🔗 - PerFedSI: A Framework for Personalized Federated Learning with Side Information (Poster)  link » With an ever-increasing number of smart edge devices with computation and communication constraints, Federated Learning (FL) is a promising paradigm for learning from distributed devices and their data. Typical approaches to FL aim to learn a single model that simultaneously performs well for all clients. But such an approach may be ineffective when the clients' data distributions are heterogeneous. In these cases, we aim to learn personalized models for each client's data yet still leverage shared information across clients. A critical avenue that may allow for such personalization is the presence of client-specific side information available to each client, such as client embeddings obtained from domain-specific knowledge, pre-trained models, or simply one-hot encodings. In this work, we propose a new FL framework for utilizing a general form of client-specific side information for personalized federated learning. We prove that incorporating side information can improve model performance for simplified multi-task linear regression and matrix completion problems. Further, we validate these results with image classification experiments on Omniglot, CIFAR-10, and CIFAR-100, revealing that proper use of side information can be beneficial for personalization. Link » Liam Collins · Enmao Diao · Tanya Roosta · Jie Ding · Tao Zhang 🔗 - FedSynth: Gradient Compression via Synthetic Data in Federated Learning (Poster)  link » Model compression is important in federated learning (FL) with large models to reduce communication cost. Prior works have been focusing on sparsification based compression that could desparately affect the global model accuracy. In this work, we propose a new scheme for upstream communication where instead of transmitting the model update, each client learns and transmits a light-weight synthetic dataset such that using it as the training data, the model performs similarly well on the real training data. The server will recover the local model update via the synthetic data and apply standard aggregation. We then provide a new algorithm FedSynth to learn the synthetic data locally. Empirically, we find our method is comparable/better than random masking baselines in all three common federated learning benchmark datasets. Link » Shengyuan Hu · Jack Goetz · Kshitiz Malik · Hongyuan Zhan · Zhe Liu · Yue Liu 🔗 - Personalized Multi-tier Federated Learning (Poster)  link » The challenge of personalized federated learning (pFL) is to capture the heterogeneity properties of data with in-expensive communications and achieving customized performance for devices. To address that challenge, we introduced personalized multi-tier federated learning using Moreau envelopes (pFedMT) when there are known cluster structures within devices. Moreau envelopes are used as the devices’ and teams’ regularized loss functions. Empirically, we verify that the personalized model performs better than vanilla FedAvg, per-FedAvg, and pFedMe. pFedMT achieves 98.30% and 99.71% accuracy on MNIST dataset under convex and non-convex settings, respectively. Link » Sourasekhar Banerjee · Alp Yurtsever · Monowar Bhuyan 🔗 - FLIS: Clustered Federated Learning via Inference Similarity for Non-IID Data Distribution (Poster)  link » Classical federated learning approaches yield significant performance degradation in the presence of Non-IID data distributions of participants. When the distribution of each local dataset is highly different from the global one, the local objective of each client will be inconsistent with the global optima which incur a drift in the local updates. This phenomenon highly impacts the performance of clients. This is while the primary incentive for clients to participate in federated learning is to obtain better personalized models. To address the above-mentioned issue, we present a new algorithm, FLIS, which groups the clients population in clusters with jointly trainable data distributions by leveraging the inference similarity of clients' models. This framework captures settings where different groups of users have their own objectives (learning tasks) but by aggregating their data with others in the same cluster (same learning task) to perform more efficient and personalized federated learning. We present experimental results to demonstrate the benefits of FLIS over the state-of-the-art benchmarks on CIFAR-100/10, SVHN, and FMNIST datasets. Link » Saeed Vahidian · Mahdi Morafah · Weijia Wang · Bill Lin 🔗 - Private and Robust Federated Learning using Private Information Retrieval and Norm Bounding (Poster)  link » Federated Learning (FL) is a distributed learning paradigm that enables mutually untrusting clients to collaboratively train a common machine learning model. Client data privacy is paramount in FL. At the same time, the model must be protected from poisoning attacks from adversarial clients. Existing solutions address these two problems in isolation. We present FedPerm, a new FL algorithm that addresses both these problems by combining norm bounding for model robustness with a novel intra-model parameter shuffling technique that amplifies data privacy by means of Private Information Retrieval (PIR) based techniques that permit cryptographic aggregation of clients' model updates. The combination of these techniques helps the federation server constrain parameter updates from clients so as to curtail effects of model poisoning attacks by adversarial clients. We further present FedPerm's unique hyperparameters that can be used effectively to trade off computation overheads with model utility. Our empirical evaluation on the MNIST dataset demonstrates FedPerm's effectiveness over existing Differential Privacy (DP) enforcement solutions in FL. Link » Hamid Mozaffari · Virendra Marathe · Dave Dice 🔗 - Reconciling Security and Communication Efficiency in Federated Learning (Poster)  link » Cross-device Federated Learning is an increasingly popular machine learning setting to train a model by leveraging a large population of client devices with high privacy and security guarantees. However, communication efficiency remains a major bottleneck when scaling federated learning to production environments, particularly due to bandwidth constraints during uplink communication. In this paper, we formalize and address the problem of compressing client-to-server model updates under the Secure Aggregation primitive, a core component of Federated Learning pipelines that allows the server to aggregate the client updates without accessing them individually. In particular, we adapt standard scalar quantization and pruning methods to Secure Aggregation and propose Secure Indexing, a variant of Secure Aggregation that supports quantization for extreme compression. We establish state-of-the-art results on LEAF benchmarks in a secure Federated Learning setup with up to 40x compression in uplink communication and no meaningful loss in utility compared to uncompressed baselines. Link » Karthik Prasad · Sayan Ghosh · Graham Cormode · Ilya Mironov · Ashkan Yousefpour · Pierre STOCK 🔗 - FedToken: Tokenized Incentives for Data Contribution in Federated Learning (Poster)  link » Incentives that compensate for the involved costs in the decentralized training of a Federated Learning (FL) model act as a key stimulus for clients' long-term participation. However, it is challenging to convince clients for quality participation in FL due to the absence of: (i) full information on the client's data quality and properties; (ii) the value of client's data contributions; and (iii) the trusted mechanism for monetary incentive offers. This often leads to poor efficiency in training and communication. While several works focus on strategic incentive designs and client selection to overcome this problem, there is a major knowledge gap in terms of an overall design tailored to the foreseen digital economy, including Web 3.0, while simultaneously meeting the learning objectives. To address this gap, we propose a contribution-based tokenized incentive scheme, namely \texttt{FedToken}, backed by blockchain technology that ensures fair allocation of tokens amongst the clients that corresponds to the valuation of their data during model training. Leveraging the engineered Shapley-based scheme, we first approximate the contribution of local models during model aggregation, then strategically schedule clients lowering the communication rounds for convergence and anchor ways to allocate \emph{affordable} tokens under a constrained monetary budget. Extensive simulations demonstrate the efficacy of our proposed method. Link » Shashi Raj Pandey · Lam Nguyen · Petar Popovski 🔗 - Federated Continual Learning with Differentially Private Data Sharing (Poster)  link » In Federated Learning (FL) many types of skews can occur, including uneven class distributions, or varying client participation. In addition, new tasks and data modalities can be encountered as time passes, which leads us to the problem domain of Federated Continual Learning (FCL). In this work we study how we can adapt some of the simplest, but often most effective, Continual Learning approaches based on replay to FL. We focus on temporal shifts in client behaviour, and show that direct application of replay methods leads to poor results. To address these shortcomings, we explore data sharing between clients employing differential privacy. This alleviates the shortcomings in current baselines, resulting in performance gains in a wide range of cases, with our method achieving maximum gains of 49%. Link » Giulio Zizzo · Ambrish Rawat · Naoise Holohan · Seshu Tirupathi 🔗 - $z$-SignFedAvg: A unified sign-based stochastic compression for federated learning (Poster)  link » Federated learning is a promising privacy-preserving distributed learning paradigm but suffers from high communication cost when training large-scale machine learning models. Sign-based methods, such as SignSGD \citep{bernstein2018signsgd}, have been proposed as a biased gradient compression technique for reducing the communication cost. However, sign-based compression could diverge under heterogeneous data, which motivate developments of advanced techniques, such as the error-feedback method and stochastic sign-based compression, to fix this issue.Nevertheless, these methods still suffer significantly slower convergence rate than uncompressed algorithms. Besides, none of them allow local multiple SGD updates like FedAvg \citep{mcmahan2017communication}. In this paper, we propose a novel noisy perturbation scheme with a general symmetric noise distribution for sign-based compression, which not only allows one to flexibly control the tradeoff between gradient bias and convergence performance, but also provides a unified viewpoint to existing sign-based methods. More importantly, we propose the very first sign-based FedAvg algorithm ($z$-SignFedAvg). Theoretically, we show that $z$-SignFedAvg achieves a faster convergence rate than existing sign-based methods and, under the uniformly distribtued noise, can even enjoy the same convergence rate as its uncompressed counterpart. Extensive experiments are conducted to demonstrate that our proposed $z$-SignFedAvg can achieve competitive empirical performance on real datasets. Link » Zhiwei Tang · Yanmeng Wang · Tsung-Hui Chang 🔗 - Measuring and Controlling Split Layer Privacy Leakage Using Fisher Information (Poster)  link » Split learning and inference propose to run training/inference of a large model that is split across client devices and the cloud. However, such a model splitting imposes privacy concerns, because the activation flowing through the split layer may leak information about the clients' private input data. There is currently no good way to quantify how much private information is being leaked through the split layer, nor a good way to improve privacy up to the desired level.In this work, we propose to use Fisher information as a privacy metric to measure and control the information leakage. We show that Fisher information can provide an intuitive understanding of how much private information is leaking through the split layer, in the form of an error bound for an unbiased reconstruction attacker. We then propose a privacy-enhancing technique, ReFIL, that can enforce a user-desired level of Fisher information leakage at the split layer to achieve high privacy, while maintaining reasonable utility. Link » Kiwan Maeng · Chuan Guo · Sanjay Kariyappa · G. Edward Suh 🔗 - FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning (Poster)  link » Federated learning (FL) has recently attracted increasing attention from academia and industry, with the ultimate goal of achieving collaborative training under privacy and communication constraints. Existing iterative model averaging based FL algorithms require a large number of communication rounds to obtain a well-performed model due to extremely unbalanced and non-i.i.d data partitioning among different clients. Thus, we propose FedDM to build the global training objective from multiple local surrogate functions, which enables the server to gain a more global view of the loss landscape. In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data through distribution matching. FedDM reduces communication rounds and improves model quality by transmitting more informative and smaller synthesized data compared with unwieldy model weights. We conduct extensive experiments on three image classification datasets, and results show that our method can outperform other FL counterparts in terms of efficiency and model performance. Moreover, we demonstrate that FedDM can be adapted to preserve differential privacy with Gaussian mechanism and train a better model under the same privacy budget. Link » Yuanhao Xiong · Ruochen Wang · Minhao Cheng · Felix Yu · Cho-Jui Hsieh 🔗 - With a Little Help from My Friend: Server-Aided Federated Learning with Partial Client Participation (Poster)  link » Although federated learning (FL) has been a prevailing distributed learning framework in recent years due to its benefits in scalability/privacy and rich applications in practice, there remain many challenges in FL system design, such as data and system heterogeneity. Notably, most existing works in the current literature only focus on addressing data heterogeneity issues (e.g., non-i.i.d. datasets across clients), while often assuming either full client or uniformly distributed client participation. However, such idealistic assumptions on client participation rarely hold in practical FL systems. It has been frequently found in FL systems that some clients may never participate in the training (aka partial/incomplete participation) due to various reasons. This motivates us to fully investigate the impacts of incomplete FL participation and develop effective mechanisms to mitigate such impacts. Toward this end, by establishing a fundamental generalization error lower bound, we first show that conventional FL is {\em not} PAC-learnable under incomplete participation. To overcome this challenge, we propose a new server-aided federated learning (SA-FL) framework with an auxiliary dataset deployed at the server, which is able to revive the PAC-learnability of FL under incomplete client participation. Upon resolving the PAC-learnability challenge, we further propose the SAFARI (server-aided federated averaging) algorithm that enjoys convergence guarantee and the same level of communication efficiency and privacy as state-of-the-art FL. Link » Haibo Yang · Peiwen Qiu · Prashant Khanduri · Jia Liu 🔗 - Federated Learning of Large Models at the Edge via Principal Sub-Model Training (Poster)  link » Limited compute and communication capabilities of edge users create a significant bottleneck for federated learning (FL) of large models. We consider a realistic, but much less explored, cross-device FL setting in which no client has the capacity to train a full large model nor is willing to share any intermediate activations with the server. To this end, we present Principal Sub-Model (PriSM) training methodology, which leverages models’ low-rank structure and kernel orthogonality to train sub-models in the orthogonal kernel space. More specifically, by applying singular value decomposition (SVD) to original kernels in the server model, PriSM first obtains a set of principal orthogonal kernels in which each one is weighed by its singular value. Thereafter, PriSM utilizes a novel sampling strategy that selects different subsets of the principal kernels independently to create sub-models for clients. Importantly, a kernel with a large singular value is assigned with a high sampling probability. Thus, each sub-model is a low-rank approximation of the full large model, and all clients together achieve the near full-model training. Our extensive evaluations on multiple datasets in resource-constrained settings show that PriSM can yield an improved performance of up to $10\%$ compared to existing alternatives, with only around $20\%$ sub-model training. Link » Yue Niu · Saurav Prakash · Souvik Kundu · Sunwoo Lee · Salman Avestimehr 🔗 - FedRule: Federated Rule Recommendation System with Graph Neural Networks (Poster)  link » Much of the value that IoT (Internet-of-Things) devices bring to `smart'' homes lies in their ability to automatically trigger other devices' actions: for example, a smart camera triggering a smart lock to unlock a door. Manually setting up these rules for smart devices or applications, however, is time-consuming and inefficient. Rule recommendation systems can automatically suggest rules for users by learning which rules are popular based on those previously deployed (e.g., in others' smart homes). Conventional recommendation formulations require a central server to record the rules used in many users' homes, which compromises their privacy. Moreover, these solutions typically leverage generic user-item matrix methods but do not fully exploit the structure of the rule recommendation problem. In this paper, we propose a new rule recommendation system, dubbed as FedRule, to address these challenges. One graph is constructed per user upon the rules s/he is using, and the rule recommendation is formulated as a link prediction task in these graphs. This formulation enables us to design a federated training algorithm that is able to keep users' data private. Extensive experiments corroborate our claims by demonstrating that FedRule has comparable performance to the centralized setting and outperforms conventional solutions. Link » Yuhang Yao · Mohammad Mahdi Kamani · Zhongwei Cheng · Lin Chen · Carlee Joe-Wong · Tianqiang Liu 🔗 - Improving Vertical Federated Learning by Efficient Communication with ADMM (Poster)  link » Vertical Federated learning (VFL) allows each client to collect partial features and jointly train the shared model. In this paper, we identified two challenges in VFL: (1) some works directly average the learned feature embeddings and therefore might lose the unique properties of each local feature set; (2) the server needs to communicate gradients with the clients for each training step, incurring high communication cost.We aim to address the above challenges and propose an efficient VFL with multiple heads (VIM) framework, where each head corresponds to local clients by taking the separate contribution of each client into account. In addition, we propose an Alternating Direction Method of Multipliers (ADMM)-based method to solve our optimization problem, which reduces the communication cost by allowing multiple local updates in each step.We show that VIM achieves significantly higher accuracy and faster convergence compared with state-of-the-arts on four datasets, and the weights of learned heads reflect the importance of local clients. Link » Chulin Xie · Pin-Yu Chen · Ce Zhang · Bo Li 🔗 - Subject Level Differential Privacy with Hierarchical Gradient Averaging (Poster)  link » Subject Level Differential Privacy (DP) is a granularity of privacy recently studied in the Federated Learning (FL) setting, where a subject is defined as an individual whose private data is embodied by multiple data records that may be distributed across a multitude of federation users. This granularity is distinct from item level and user level privacy appearing in the literature. Prior work on subject level privacy in FL focuses on algorithms that are derivatives of group DP or enforce user level Local DP (LDP). In this paper, we present a new algorithm – Hierarchical Gradient Averaging (HiGradAvgDP) – that achieves subject level DP by constraining the effect of individual subjects on the federated model. We prove the privacy guarantee for HiGradAvgDP and empirically demonstrate its effectiveness in preserving model utility on the FEMNIST and Shakespeare datasets. We also report, for the first time, a unique problem of privacy loss composition, which we call horizontal composition, that is relevant only to subject level DP in FL. We show how horizontal composition can adversely affect model utility by either in- creasing the noise necessary to achieve the DP guarantee, or by constraining the amount of training done on the model. Link » Virendra Marathe · Pallika Kanani · Daniel Peterson 🔗 - MocoSFL: enabling cross-client collaborative self-supervised learning (Poster)  link » Existing collaborative self-supervised learning (SSL) schemes are not suitable for cross-client applications because of their expensive computation and large local data requirements. To address these issues, we propose MocoSFL, a collaborative SSL framework based on Split Federated Learning (SFL) and Momentum Contrast (MoCo). In MocoSFL, the large backbone model is split into a small client-side model and a large server-side model, and only the small client-side model is processed locally on the client's local devices. MocoSFL is equipped with three components: (i) vector concatenation which enables the use of small batch size and reduces computation and memory requirements by orders of magnitude; (ii) feature sharing that helps achieve high accuracy regardless of the quality and volume of local data; (iii) frequent synchronization that helps achieve better non-IID performance because of smaller local model divergence.For a 1,000-client case with non-IID data (each client has data from 2 random classes of CIFAR-10), MocoSFL can achieve over 84% accuracy with ResNet-18 model. Link » Jingtao Li · Lingjuan Lyu · Daisuke Iso · Chaitali Chakrabarti · Michael Spranger 🔗 - Federated Continual Learning to Detect Accounting Anomalies in Financial Auditing (Poster)  link » The International Standards on Auditing require auditors to collect reasonable assurance that financial statements are free of material misstatement, whether caused by error or fraud. At the same time, a central objective of Continuous Assurance is the ‘real-time’ assessment of digital accounting journal entries. Recently, driven by the advances in artificial intelligence, Deep Learning techniques have emerged in financial auditing to examine vast quantities of accounting data. However, learning highly adaptive audit models in decentralised and dynamic settings remains challenging. It requires the study of data distribution shifts over multiple clients and time periods. In this work, we propose a Federated Continual Learning framework enabling auditors to learn audit models from decentral clients continuously. We evaluate the framework’s ability to detect accounting anomalies in common scenarios of organizational activity. Our empirical results, using real-world datasets and combined federated-continual learning strategies, demonstrate the learned model's ability to detect anomalies in audit settings of data distribution shifts. Link » Marco Schreyer · Hamed Hemati · Damian Borth · Miklos Vasarhelyi 🔗 - Certified Robustness in Federated Learning (Poster)  link » Federated learning has recently gained significant attention and popularity due to its effectiveness in training machine learning models on distributed data privately. However, as in the single-node supervised learning setup, models trained in federated learning suffer from vulnerability to imperceptible input transformations known as adversarial attacks, questioning their deployment in security-related applications. In this work, we study the interplay between federated training, personalization, and certified robustness.In particular, we deploy randomized smoothing, a widely-used and scalable certification method, to certify deep networks trained on a federated setup against input perturbations and transformations. We find that the simple federated averaging technique is effective in building not only more accurate, but also more certifiably-robust models, compared to training solely on local data. We further analyze personalization, a popular technique in federated training that increases the model's bias towards local data, on robustness. We show several advantages of personalization over both~(that is, only training on local data and federated training) in building more robust models with faster training. Finally, we explore the robustness of mixtures of global and local~(\ie personalized) models, and find that the robustness of local models degrades as they diverge from the global model. Link » Motasem Alfarra · Juan Perez · Egor Shulgin · Peter Richtarik · Bernard Ghanem 🔗 - Federated Sparse Training: Lottery Aware Model Compression for Resource Constrained Edge (Poster)  link » Limited computation and communication capabilities of clients pose significant challenges in federated learning (FL) over resource-limited edge nodes. A potential solution to this problem is to deploy off-the-shelf sparse learning algorithms that train a binary sparse mask on each client with the expectation of training a consistent sparse server mask. However, as we investigate in this paper, such naive deployments result in a significant accuracy drop compared to FL with dense models, especially under clients' low resource budgets. In particular, our investigations reveal a serious lack of consensus among the trained masks on clients, which prevents convergence on the server mask and potentially leads to a substantial drop in model performance. Based on such key observations, we propose \textit{federated lottery aware sparsity hunting} (FLASH), a unified sparse learning framework to make the server win a lottery in terms of a sparse sub-model, which can greatly improve performance under highly resource-limited client settings. Moreover, to address the issue of device heterogeneity, we leverage our findings to propose \textit{hetero-FLASH}, where clients can have different target sparsity budgets based on their device resource limits. Extensive experimental evaluations with multiple models on various datasets (both IID and non-IID) show superiority of our models in yielding up to $\mathord{\sim}10.1\%$ improved accuracy with $\mathord{\sim}10.26\times$ fewer communication costs, compared to existing alternatives, at similar hyperparameter settings. Link » Sara Babakniya · Souvik Kundu · Saurav Prakash · Yue Niu · Salman Avestimehr 🔗 - Asynchronous speedup in decentralized optimization (Poster)  link » In decentralized optimization, nodes of a communication network each possess a local objective function, and communicate using gossip-based methods in order to minimize the average of these per-node functions. While synchronous algorithms are heavily impacted by a few slow nodes or edges in the graph (the \emph{straggler problem}), their asynchronous counterparts are notoriously harder to parametrize. Indeed, their convergence properties for networks with heterogeneous communication and computation delays have defied analysis so far. In this paper, we use a \emph{ continuized} framework to analyze asynchronous algorithms in networks with delays. Our approach yields a precise characterization of convergence time and of its dependency on heterogeneous delays in the network. Our continuized framework benefits from the best of both continuous and discrete worlds: the algorithms it applies to are based on event-driven updates. They are thus essentially discrete and hence readily implementable. Yet their analysis is essentially in continuous time, relying in part on the theory of delayed ODEs. Our algorithms moreover achieve an \emph{asynchronous speedup}: their rate of convergence is controlled by the eigengap of the network graph weighted by local delays, instead of the network-wide worst-case delay as in previous analyses. Our methods thus enjoy improved robustness to stragglers. Link » Mathieu Even · Hadrien Hendrikx · Laurent Massoulié 🔗