Timezone: »

 
Workshop
Privacy in Machine Learning (PriML)
Borja Balle · Kamalika Chaudhuri · Antti Honkela · Antti Koskela · Casey Meehan · Mi Jung Park · Mary Anne Smart · Mary Anne Smart · Adrian Weller

Sat Dec 14 08:00 AM -- 06:00 PM (PST) @ East Meeting Rooms 8 + 15
Event URL: https://priml-workshop.github.io/priml2019/ »

The goal of our workshop is to bring together privacy experts working in academia and industry to discuss the present and the future of privacy-aware technologies powered by machine learning. The workshop will focus on the technical aspects of privacy research and deployment with invited and contributed talks by distinguished researchers in the area. The programme of the workshop will emphasize the diversity of points of view on the problem of privacy. We will also ensure there is ample time for discussions that encourage networking between researches, which should result in mutually beneficial new long-term collaborations.

Sat 8:10 a.m. - 8:15 a.m. [iCal]
Opening
Sat 8:15 a.m. - 9:05 a.m. [iCal]
Privacy for Federated Learning, and Federated Learning for Privacy (Invited talk)
Brendan McMahan
Sat 9:05 a.m. - 9:25 a.m. [iCal]

Differential privacy has seen remarkable success as a rigorous and practical formalization of data privacy in the past decade. This privacy definition and its divergence based relaxations, however, have several acknowledged weaknesses, either in handling composition of private algorithms or in analyzing important primitives like privacy amplification by subsampling. Inspired by the hypothesis testing formulation of privacy, this paper proposes a new relaxation, which we term ❝f-differential privacy❞ (f-DP). This notion of privacy has a number of appealing properties and, in particular, avoids difficulties associated with divergence based relaxations. First, f-DP preserves the hypothesis testing interpretation. In addition, f-DP allows for lossless reasoning about composition in an algebraic fashion. Moreover, we provide a powerful technique to import existing results proven for original DP to f-DP and, as an application, obtain a simple subsampling theorem for f-DP. In addition to the above findings, we introduce a canonical single-parameter family of privacy notions within the f-DP class that is referred to as ❝Gaussian differential privacy❞ (GDP), defined based on testing two shifted Gaussians. GDP is focal among the f-DP class because of a central limit theorem we prove. More precisely, the privacy guarantees of any hypothesis testing based definition of privacy (including original DP) converges to GDP in the limit under composition. The CLT also yields a computationally inexpensive tool for analyzing the exact composition of private algorithms. Taken together, this collection of attractive properties render f-DP a mathematically coherent, analytically tractable, and versatile framework for private data analysis. Finally, we demonstrate the use of the tools we develop by giving an improved privacy analysis of noisy stochastic gradient descent.

Jinshuo Dong, Aaron Roth
Sat 9:25 a.m. - 9:45 a.m. [iCal]

Recently, there has been a wealth of effort devoted to the design of secure protocols for machine learning tasks. Much of this is aimed at enabling secure prediction from highly-accurate Deep Neural Networks (DNNs). However, as DNNs are trained on data, a key question is how such models can be also trained securely. The few prior works on secure DNN training have focused either on designing custom protocols for existing training algorithms or on developing tailored training algorithms and then applying generic secure protocols. In this work, we investigate the advantages of designing training algorithms alongside a novel secure protocol, incorporating optimizations on both fronts. We present QUOTIENT, a new method for discretized training of DNNs, along with a customized secure two-party protocol for it. QUOTIENT incorporates key components of state-of-the-art DNN training such as layer normalization and adaptive gradient methods, and improves upon the state-of-the-art in DNN training in two-party computation. Compared to prior work, we obtain an improvement of 50X in WAN time and 6% in absolute accuracy.

Nitin Agrawal, Matt Kusner, Adria Gascon
Sat 9:45 a.m. - 10:30 a.m. [iCal]
Coffee break (Break)
Sat 10:30 a.m. - 11:20 a.m. [iCal]

Data collected about individuals is regularly used to make decisions that impact those same individuals. We consider settings where sensitive personal data is used to decide who will receive resources or benefits. While it is well known that there is a tradeoff between protecting privacy and the accuracy of decisions, in this talk, I will describe our recent work on a first-of-its-kind empirical study into the impact of formally private mechanisms (based on differential privacy) on fair and equitable decision-making.

Ashwin Machanavajjhala
Sat 11:20 a.m. - 11:30 a.m. [iCal]
1. [Jonathan Lebensold, William Hamilton, Borja Balle and Doina Precup] Actor Critic with Differentially Private Critic (#08) 2. [Andres Munoz, Umar Syed, Sergei Vassilvitskii and Ellen Vitercik] Private linear programming without constraint violations (#17) 3. [Ios Kotsogiannis, Yuchao Tao, Xi He, Ashwin Machanavajjhala, Michael Hay and Gerome Miklau] PrivateSQL: A Differentially Private SQL Query Engine (#27) 4. [Amrita Roy Chowdhury, Chenghong Wang, Xi He, Ashwin Machanavajjhala and Somesh Jha] Crypt$\epsilon$: Crypto-Assisted Differential Privacy on Untrusted Servers (#31) 5. [Jiaming Xu and Dana Yang] Optimal Query Complexity of Private Sequential Learning (#32) 6. [Hsiang Hsu, Shahab Asoodeh and Flavio Calmon] Discovering Information-Leaking Samples and Features (#43) 7. [Martine De Cock, Rafael Dowsley, Anderson Nascimento, Davis Railsback, Jianwei Shen and Ariel Todoki] Fast Secure Logistic Regression for High Dimensional Gene Data (#44) 8. [Giuseppe Vietri, Grace Tian, Mark Bun, Thomas Steinke and Steven Wu] New Oracle-Efficient Algorithms for Private Synthetic Data Release (#45)
Sat 11:30 a.m. - 12:30 p.m. [iCal]
Poster Session
ccanonne Canonne, Kwang-Sung Jun, Seth Neel, Di Wang, giuseppe vietri, Liwei Song, Jonathan Lebensold, Huanyu Zhang, Lovedeep Gondara, Ang Li, FatemehSadat Mireshghallah, Jinshuo Dong, Anand D Sarwate, Antti Koskela, Joonas Jälkö, Matt Kusner, Dingfan Chen, Mi Jung Park, Ashwin Machanavajjhala, Jayashree Kalpathy-Cramer, , Vitaly Feldman, Andrew Tomkins, Hai Phan, Hossein Esfandiari, Mimansa Jaiswal, Mrinank Sharma, Jeff Druce, Casey Meehan, Zhengli Zhao, Hsiang Hsu, Davis Railsback, Abraham Flaxman, , Julius Adebayo, Aleksandra Korolova, Jiaming Xu, Naoise Holohan, Samyadeep Basu, Matthew Joseph, My Thai, Dana Yang, Ellen Vitercik, Michael Hutchinson, Chenghong Wang, Gregory Yauney, Yuchao Tao, Chao Jin, Sky Lee, Audra McMillan, Rauf Izmailov, Jiayi Guo, Siddharth Swaroop, Tribhuvanesh Orekondy, Hadi Esmaeilzadeh, Kevin Procopio, Alkis Polyzotis, Jafar Mohammadi, Nitin Agrawal
Sat 12:30 p.m. - 2:00 p.m. [iCal]
Lunch break (Break)
Sat 2:00 p.m. - 2:50 p.m. [iCal]

There is a growing demand for ML methods that limit inappropriate use of protected information to avoid both disparate treatment and disparate impact. In this talk, we present Generative Adversarial rePresentations (GAP) as a data-driven framework that leverages recent advancements in adversarial learning to allow a data holder to learn universal representations that decouple a set of sensitive attributes from the rest of the dataset while allowing learning multiple downstream tasks. We will briefly highlight the theoretical and practical results of GAP.

In the second half of the talk we will focus on model auditing. Privacy concerns have led to the development of privacy-preserving approaches for learning models from sensitive data. Yet, in practice, models (even those learned with privacy guarantees) can inadvertently memorize unique training examples or leak sensitive features. To identify such privacy violations, existing model auditing techniques use finite adversaries defined as machine learning models with (a) access to some finite side information (e.g., a small auditing dataset), and (b) finite capacity (e.g., a fixed neural network architecture). In the second half of the talk, we present requirements under which an unsuccessful attempt to identify privacy violations by a finite adversary implies that no stronger adversary can succeed at such a task. We will do so via parameters that quantify the capabilities of the finite adversary, including the size of the neural network employed by such an adversary and the amount of side information it has access to as well as the regularity of the (perhaps privacy-guaranteeing) audited model.

Lalitha Sankar
Sat 2:50 p.m. - 3:10 p.m. [iCal]

A centrally differentially private algorithm maps raw data to differentially private outputs. In contrast, a locally differentially private algorithm may only access data through public interaction with data holders, and this interaction must be a differentially private function of the data. We study the intermediate model of pan-privacy. Unlike a locally private algorithm, a pan-private algorithm receives data in the clear. Unlike a centrally private algorithm, the algorithm receives data one element at a time and must maintain a differentially private internal state while processing this stream. First, we show that pan-privacy against multiple intrusions on the internal state is equivalent to sequentially interactive local privacy. Next, we contextualize pan-privacy against a single intrusion by analyzing the sample complexity of uniformity testing over domain [k]. Focusing on the dependence on k, centrally private uniformity testing has sample complexity Θ(√k), while noninteractive locally private uniformity testing has sample complexity Θ(k). We show that the sample complexity of pan-private uniformity testing is Θ(k2/3). By a new Ω(k) lower bound for the sequentially interactive setting, we also separate pan-private from sequentially interactive locally private and multi-intrusion pan-private uniformity testing.

Kareem Amin, Matthew Joseph
Sat 3:10 p.m. - 3:30 p.m. [iCal]

We study differentially private (DP) algorithms for stochastic convex optimization: the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions. A recent work of Bassily et al. (2019) has established the optimal bound on the excess population loss achievable given n samples. Unfortunately, their algorithm achieving this bound is relatively inefficient: it requires O(min{n3/2,n5/2/d}) gradient computations, where d is the dimension of the optimization problem. We describe two new techniques for deriving DP convex optimization algorithms both achieving the optimal bound on excess loss and using O(min{n,n2/d}) gradient computations. In particular, the algorithms match the running time of the optimal non-private algorithms. The first approach relies on the use of variable batch sizes and is analyzed using the privacy amplification by iteration technique of Feldmanet al. (2018). The second approach is based on a general reduction to the problem of localizing an approximately optimal solution with differential privacy. Such localization, in turn, can be achieved using existing (non-private) uniformly stable optimization algorithms. As in the earlier work, our algorithms require a mild smoothness assumption. We also give a linear-time algorithm achieving the optimal bound on the excess loss for the strongly convex case, as well as a faster algorithm for the non-smooth case.

Vitaly Feldman, Tomer Koren, Kunal Talwar
Sat 3:30 p.m. - 4:15 p.m. [iCal]
Coffee break (Break)
Sat 4:15 p.m. - 5:05 p.m. [iCal]

To control vulnerabilities to reconstruction-abetted re-identification attacks that leverage massive external data stores and cheap computation, the U.S. Census Bureau has elected to adopt a formally private approach to disclosure limitation in the 2020 Decennial Census of Population and Housing. To this end, a team of disclosure limitation specialists have worked over the past 3 years to design and implement the U.S. Census Bureau TopDown Disclosure Limitation Algorithm (TDA). This formally private algorithm generates Persons and Households micro-data, which will then be tabulated to produce the final set of demographic statistics published as a result of the 2020 Census enumeration. In this talk, I outline the main features of TDA, describe the current iteration of the process used to help policy makers decide how to set and allocate privacy-loss budget, and outline known issues with - and intended fixes for - the current implementation of TDA.

Philip Leclerc
Sat 5:05 p.m. - 5:55 p.m. [iCal]
Panel Discussion (Discussion Panel)
Sat 5:55 p.m. - 6:00 p.m. [iCal]
Closing

Author Information

Borja Balle (Amazon)
Kamalika Chaudhuri (UCSD)
Antti Honkela (University of Helsinki)
Antti Koskela (University of Helsinki)
Casey Meehan (University of California, San Diego)
Mi Jung Park (MPI-IS Tuebingen)
Mary Anne Smart (UC San Diego)
Mary Anne Smart (University of California, San Diego)
Adrian Weller (Cambridge, Alan Turing Institute)

Adrian Weller is Programme Director for AI at The Alan Turing Institute, the UK national institute for data science and AI, where he is also a Turing Fellow leading work on safe and ethical AI. He is a Senior Research Fellow in Machine Learning at the University of Cambridge, and at the Leverhulme Centre for the Future of Intelligence where he leads the project on Trust and Transparency. His interests span AI, its commercial applications and helping to ensure beneficial outcomes for society. He serves on several boards including the Centre for Data Ethics and Innovation. Previously, Adrian held senior roles in finance.

More from the Same Authors