Recent rapid development of machine learning has largely benefited from algorithmic advances, collection of large-scale datasets, and availability of high-performance computation resources, among others. However, the large volume of collected data and massive information may also bring serious security, privacy, services provisioning, and network management challenges. In order to achieve decentralized, secure, private, and trustworthy machine learning operation and data management in this “data-centric AI” era, the joint consideration of blockchain techniques and machine learning may bring significant benefits and have attracted great interest from both academia and industry. On the one hand, decentralization and blockchain techniques can significantly facilitate training data and machine learning model sharing, decentralized intelligence, security, privacy, and trusted decision-making. On the other hand, Web3 platforms and applications, which are built on blockchain technologies and token-based economics, will greatly benefit from machine learning techniques in resource efficiency, scalability, trustworthy machine learning, and other ML-augmented tools for creators and participants in the end-to-end ecosystems.
This workshop focuses on how future researchers and practitioners should prepare themselves to achieve different trustworthiness requirements, such as security and privacy in machine learning through decentralization and blockchain techniques, as well as how to leverage machine learning techniques to automate some processes in current decentralized systems and ownership economies in Web3. We attempt to share recent related work from different communities, discuss the foundations of trustworthiness problems in machine learning and potential solutions, tools, and platforms via decentralization, blockchain and Web3, and chart out important directions for future work and cross-community collaborations.
Sat 6:45 a.m. - 7:00 a.m.
|
Introduction and Opening Remarks
(
Intro
)
SlidesLive Video » |
🔗 |
Sat 7:00 a.m. - 7:30 a.m.
|
Invited Talk: Elaine Shi - Crypto meets decentralized mechanism design
(
Talk
)
SlidesLive Video » Space in a blockchain is a scarce resource. Cryptocurrencies today use auctions to decide which transactions get confirmed in the block. Intriguingly, classical auctions fail in such a decentralized environment, since even the auctioneer can be a strategic player. For example, the second-price auction is a golden standard in classical mechanism design. It fails, however, in the blockchain environment since the miner can easily inject a bid that is epsilon smaller than the k-th price where k is the block size. Moreover, the miner and users can also collude through the smart contract mechanisms available in modern cryptocurrencies. I will talk about a new foundation for mechanism design in a decentralized environment. I will prove an impossibility result which rules out the existence of a dream transaction fee mechanism that incentivizes honest behavior for the user, the miner, and a miner-user coalition at the same time. I will then show how cryptography can help us overcome the impossibility results. |
🔗 |
Sat 7:30 a.m. - 8:00 a.m.
|
Invited Talk: Virginia Smith - Practical Approaches for Private Adaptive Optimization
(
Talk
)
SlidesLive Video » Adaptive optimizers (e.g., AdaGrad, Adam) are widely used in machine learning. Despite their success in non-private training, the benefits of adaptivity tend to degrade when training with differential privacy for applications such as federated learning. We explore two simple techniques to improve the performance of private adaptive optimizers. First, we study the use of side information as a way to precondition gradients and effectively approximate gradient geometry. In cases where such side information is not available, we then propose differentially private adaptive training with delayed preconditioners (DP^2), a simple method that constructs delayed but less noisy preconditioners to realize the benefits of adaptivity. We analyze both approaches in theory and in practice, showing that these practical techniques can allow for many of the benefits lost when applying state-of-the-art optimizers in private settings to be regained. |
🔗 |
Sat 8:00 a.m. - 8:30 a.m.
|
Invited Talk: Peter Kairouz - The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning
(
Talk
)
SlidesLive Video » In this talk, we consider the problem of training a machine learning model with distributed differential privacy (DP) where secure aggregation (SecAgg) is used to ensure that the server only sees the noisy sum of model updates in every training round. Taking into account the linearity constraints imposed by SecAgg, we characterize the optimal communication cost required to obtain the best accuracy achievable under central DP (i.e. under a fully trusted server and no communication constraints), and we derive a simple and efficient scheme that achieves the optimal bandwidth. We evaluate the optimal scheme on real-world federated learning tasks to show that we can reduce the communication cost to under 1.78 bits per parameter in realistic privacy settings without decreasing test-time performance. We conclude the talk with a few important and non-trivial open research directions. |
Peter Kairouz 🔗 |
Sat 8:30 a.m. - 9:00 a.m.
|
Invited Talk: Richard Socher, Swetha Mandava, Zairah Mustahsan - Becoming Data-Centric at You.com - a privacy focussed search engine
(
Talk
)
SlidesLive Video » Being data-driven improves decision-making outcomes and enables automation, but building data-driven tooling and culture is a complex and challenging task, especially for startups with limited resources. We will discuss this difficult task of creating an analytics platform from scratch at you.com to protect user privacy while driving decision-making across the organization. The amount of data created daily is exponentially rising, and harnessing that data effectively and ethically is crucial for success in today’s world. We’ll talk about automatic data collection with privacy constraints and the infrastructure setup for data ingestion (Kafka), persistence (Delta Lake, CosmosDB), processing (Spark), access, and analytics platforms (Scuba, Databricks). We’ll walk through the lessons learned while using this mostly unstructured and unlabelled data for A/B tests and to train our search and ranking models, the importance of defining custom metrics specific to your product, and the necessary changes at the organizational level to drive adoption and confidence in data-centric approaches. |
Zairah Mustahsan · Mani Swetha Mandava 🔗 |
Sat 9:00 a.m. - 9:30 a.m.
|
Invited Talk: Xi Chen - Decentralized Finance: Delta Hedging Liquidity Positions on AMM
(
Talk
)
SlidesLive Video » Decentralized finance is becoming increasingly popular nowadays. Liquidity Providers on Automated Market Makers (AMM) generate millions of USD in transaction fees daily. However, the net value of a Liquidity Position is vulnerable to price changes in the underlying assets in the pool. In this talk, we address an important question on AMM: "How can we earn transaction fees on liquidity positions in AMM without exposure to the directional price risk with the assets in the liquidity pool ?". By leveraging a portfolio of options, we propose an algorithm to delta hedge arbitrary Liquidity Positions on both uniform liquidity Automated Market Makers (such as Uniswap v2) and concentrated liquidity Automated Market Makers (such as Uniswap v3) via a combination of derivatives. |
🔗 |
Sat 9:30 a.m. - 10:00 a.m.
|
Invited Talk: Percy Liang - Decentralized Foundation Models
(
Talk
)
SlidesLive Video » |
🔗 |
Sat 10:00 a.m. - 10:30 a.m.
|
Morning Poster Session
(
Poster
)
|
🔗 |
Sat 10:30 a.m. - 12:00 p.m.
|
Lunch Break
|
🔗 |
Sat 12:00 p.m. - 12:30 p.m.
|
Bayesian-Nash-Incentive-Compatible Mechanism for Blockchain Transaction Fee Allocation
(
Oral
)
SlidesLive Video » In blockchain systems, the design of transaction fee mechanisms is essential for stability and satisfactory for both miners and users. A recent work has proven the impossibility of collusion-proof mechanisms with non-zero miner revenue which is Dominate-Strategy-Incentive-Compatible (DSIC) for users. In our work, we relax the DSIC requirement for users to Bayesian-Nash-Incentive-Compatibility (BNIC), and design a so-called soft second-price mechanism to ensure a form of collusion-proofness with an asymptotic constant-factor approximation of optimal miner revenue. Our result breaks the zero-revenue barrier while preserving reasonable truthfulness and collusion-proof properties. |
Zishuo Zhao · Xi Chen · Yuan Zhou 🔗 |
Sat 12:30 p.m. - 1:00 p.m.
|
Invited Talk: Xiaoyuan Liu - CoLearn: Decentralized Programming for Decentralized Data Science
(
Talk
)
SlidesLive Video » Data collaboration is a common need in many research fields, but in the real world, it faces many development and deployment challenges. We introduce CoLink, which is a simple, secure, and flexible decentralized programming abstraction, and introduce CoLearn, which is a platform on top of CoLink for decentralized data science. We motivate our design with our experience in building real-world decentralized data collaboration applications. We give details about our construction and explain how it simplifies and accelerates the development of cryptographic and distributed protocols. We also provide examples of coding in CoLink. With a unified interface that increases potential data and coding contributors, we hope to reduce data collaboration solution building time from months to seconds and enable larger-scale decentralized data collaboration solutions to unlock the true value of data. |
🔗 |
Sat 1:00 p.m. - 1:20 p.m.
|
FLock: Defending Malicious Behaviors in Federated Learning with Blockchain
(
Oral
)
SlidesLive Video » Federated learning (FL) is a promising way to allow multiple data owners (clients)to collaboratively train machine learning models without compromising data pri-vacy. Yet, existing FL solutions usually rely on a centralized aggregator for modelweight aggregation, while assuming clients are honest. Even if data privacy canstill be preserved, the problem of single-point failure and data poisoning attackfrom malicious clients remains unresolved. To tackle this challenge, we propose touse distributed ledger technology (DLT) to achieve FLock, a secure and reliabledecentralized Federated Learning system built on blockchain. To guarantee modelquality, we design a novel peer-to-peer (P2P) review and reward/slash mechanismto detect and deter malicious clients, powered by on-chain smart contracts. The re-ward/slash mechanism, in addition, serves as incentives for participants to honestlyupload and review model parameters in the FLock system. FLock thus improvesthe performance and the robustness of FL systems in a fully P2P manner. |
Jiahao Sun · Shuoying Zhang · Shuhao Zheng · Zhieng Wang · 🔗 |
Sat 1:20 p.m. - 1:40 p.m.
|
Scalable Collaborative Learning via Representation Sharing
(
Oral
)
SlidesLive Video » Decentralized machine learning has become a key conundrum for multi-party artificial intelligence. Existing algorithms usually rely on the release of model parameters to spread the knowledge across users. This can raise several issues, particularly in terms of communication if the models are large. Additionally, participants in such frameworks cannot freely choose their model architecture as they must coincide to collaborate.In this work, we present a novel approach for decentralized machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss (contrastive w.r.t. the labels). The goal is to ensure that the participants learn similar features on similar classes without sharing their input data nor their model parameters. To do so, each client releases averaged last hidden layer activations of similar labels to a central server that only acts as a relay (i.e., is not involved in the training or aggregation of the models). Then, the clients download these last layer activations (feature representations) of the ensemble of users and distill their knowledge in their personal model using a contrastive objective.For cross-device applications (i.e., small local datasets and limited computational capacity), this approach increases the utility of the models compared to independent learning, is communication efficient and is scalable with the number of clients. We prove theoretically that our framework is well-posed, and we benchmark its performance against standard collaborative learning algorithms on various datasets using different model architectures. |
Frédéric Berdoz · Abhishek Singh · Martin Jaggi · Ramesh Raskar 🔗 |
Sat 1:40 p.m. - 2:00 p.m.
|
Afternoon Poster Session
(
Poster
)
|
🔗 |
Sat 2:00 p.m. - 2:00 p.m.
|
Closing Remarks
|
🔗 |
-
|
A Blockchain Protocol for Human-in-the-Loop AI
(
Poster
)
SlidesLive Video » Intelligent human inputs are required both in the training and operation of AI systems, and within the governance of blockchain systems and decentralized autonomous organizations (DAOs). This paper presents a formal definition of Human Intelligence Primitives (HIPs), and describes the design and implementation of an Ethereum protocol for their on-chain collection, modeling, and integration in machine learning workflows. |
Nassim Dehouche · Richard Blythman 🔗 |
-
|
Communication-efficient Decentralized Deep Learning
(
Poster
)
SlidesLive Video » Decentralized deep learning algorithms leverage peer-to-peer communication of model parameters and/or gradients over communication graphs among the learning agents with access to their private data sets. The majority of the studies in this area focuses on achieving high accuracy, many at the expense of increased communication overhead among the agents. However, large peer-to-peer communication overhead often becomes a practical challenge, especially in harsh environments such as for an underwater sensor network. In this paper, we aim to reduce communication overhead while achieving similar performance as the state-of-the-art algorithms. To achieve this, we use the concept of Minimum Connected Dominating Set from graph theory that is applied in ad hoc wireless networks to address communication overhead issues. Specifically, we propose a new decentralized deep learning algorithm called minimum connected Dominating Set Model Aggregation (DSMA). We investigate the efficacy of our method for different communication graph topologies with a small to large number of agents using varied neural network model architectures. Empirical results on benchmark data sets show a significant (up to 100X) reduction in communication time while preserving the accuracy or in some cases increasing it compared to the state-of-the-art methods. We also present an analysis to show the convergence of our proposed algorithm. |
Fateme Fotouhi · Aditya Balu · Zhanhong Jiang · Yasaman Esfandiari · Salman Jahani · Soumik Sarkar 🔗 |
-
|
A Secure Aggregation for Federated Learning on Long-Tailed Data
(
Poster
)
SlidesLive Video » As a distributed learning, Federated Learning (FL) faces two challenges: the unbalanced distribution of training data among participants, and the model attack by Byzantine nodes. In this paper, we consider the long-tailed distribution with the presence of Byzantine nodes in the FL scenario. A novel two-layer aggregation method is proposed for the rejection of malicious models and the advisable selection of valuable models containing tail class data information. We introduce the concept of think tank to leverage the wisdom of all participants. Preliminary experiments validate that the think tank can make effective model selections for global aggregation. |
Yanna Jiang · Baihe Ma · Xu Wang · Guangsheng Yu · Caijun Sun · Wei Ni · Ren Ping Liu 🔗 |
-
|
Simulations for Open Science Token Communities: Designing the Knowledge Commons
(
Poster
)
SlidesLive Video » The curation and dissemination of new knowledge between peers is one of the key pillars of science and plays an integral role in maintaining the scientific method. Despite the distributed nature of knowledge, its curation is dominated by a handful of gatekeepers that offer an unequal exchange of intellectual property rights for academic prestige to those using their services. The power imbalance between knowledge producers and curators has led to overall systemic inefficiencies with scientific enterprise, fragmented communities, inequity in knowledge accessibility, underutilized intellectual capital, and suboptimal incentives for the stakeholders within the science ecosystem. This work presents alternative models to bootstrap scientific funding within distributed communities. We use generalized dynamical systems to run simulations of the economic activity in science to identify current inefficiencies and inform the future development of peer-to-peer systems optimized for knowledge creation. Content delivery through autonomous knowledge markets utilizing cryptographic access control protocols and peer-review reward mechanisms is shown to allow for programmable conditional incentives. |
Jakub Smékal · Shady El Damaty 🔗 |
-
|
Modulus: An Open Modular Design for Interoperable and Reusable Machine Learning
(
Poster
)
SlidesLive Video » Modulus provides an open framework for developers to create modular, inter- operable modules. These modules are designed to be modular (surprise?), reusable and inter-operable locally and remotely via peer to peer communication protocols. Modules are lightweight and general enough to wrap over any machine learning tool. Developers can also organize modules into a module file system, representing their own module hub. Developers can also expose their modules as public endpoints through their local peer, and can restrict access based on their accounts signature. Modulus is by design open source and does not rely on any tokenomics, allowing developers to monetize their public endpoints through any tokenized asset including their own. |
Salvatore Vivona 🔗 |
-
|
Incentivizing Intelligence: The Bittensor Approach
(
Poster
)
SlidesLive Video » Inspired by the efficiency of financial markets, we propose that a market system can be used to effectively produce machine intelligence. This paper introduces a mechanism in which machine intelligence is valued by other intelligence systems peer-to-peer across the internet. Peers rank each other by training neural networks that are able to learn the value of their neighbours, while scores accumulate on a digital ledger. High-ranking peers are rewarded with additional weight in the network. In addition, the network features an incentive mechanism designed to resist collusion. The result is a collectively run machine intelligence market that continually produces newly trained models and rewards participants who contribute information-theoretic value to the system. |
Yuqian Hu · Jacqueline Dawn · Ala Shaabana 🔗 |
-
|
Addressing bias in Face Detectors using Decentralised Data collection with incentives
(
Poster
)
SlidesLive Video » Recent developments in machine learning have shown that successful models do not rely only on huge amounts of data but the right kind of data. We show in this paper how this data-centric approach can be facilitated in a decentralised manner to enable efficient data collection for algorithms. Face detectors are a class of models that suffer heavily from bias issues as they have to work on a large varietyof different data.We also propose a face detection and anonymization approach using a hybrid Multi-Task Cascaded CNN with FaceNet Embeddings to benchmark multiple datasets todescribe and evaluate the bias in the models towards different ethnicities, genderand age groups along with ways to enrich fairness in a decentralized system of datalabelling, correction and verification by users to create a robust pipeline for modelretraining. |
Ahan M R · Robin Lehmann · Richard Blythman 🔗 |
-
|
Opportunities for Decentralized Technologies within AI Hubs
(
Poster
)
SlidesLive Video » Deep learning requires heavy amounts of storage and compute with assets that are commonly stored in AI Hubs. AI Hubs have contributed significantly to the democratization of AI. However, existing implementations are associated with certain benefits and limitations that stem from the underlying infrastructure and governance systems with which they are built. These limitations include high costs, lack of monetization and reward, lack of control and difficulty of reproducibility. In the current work, we explore the potential of decentralized technologies - such as Web3 wallets, peer-to-peer marketplaces, storage and compute, and DAOs - to address some of these issues. We suggest that these infrastructural components can be used in combination in the design and construction of decentralized AI Hubs. |
Richard Blythman · Mohamed Arshath · Sal Vivona · Jakub Smékal · Hithesh Shaji 🔗 |