Timezone: »

Machine Learning with Guarantees
Ben London · Gintare Karolina Dziugaite · Daniel Roy · Thorsten Joachims · Aleksander Madry · John Shawe-Taylor

Sat Dec 14 08:00 AM -- 06:00 PM (PST) @ West Ballroom B
Event URL: https://sites.google.com/view/mlwithguarantees »

As adoption of machine learning grows in high-stakes application areas (e.g., industry, government and health care), so does the need for guarantees: how accurate a learned model will be; whether its predictions will be fair; whether it will divulge information about individuals; or whether it is vulnerable to adversarial attacks. Many of these questions involve unknown or intractable quantities (e.g., risk, regret or posterior likelihood) and complex constraints (e.g., differential privacy, fairness, and adversarial robustness). Thus, learning algorithms are often designed to yield (and optimize) bounds on the quantities of interest. Beyond providing guarantees, these bounds also shed light on black-box machine learning systems.

Classical examples include structural risk minimization (Vapnik, 1991) and support vector machines (Cristianini & Shawe-Taylor, 2000), while more recent examples include non-vacuous risk bounds for neural networks (Dziugaite & Roy, 2017, 2018), algorithms that optimize both the weights and structure of a neural network (Cortes, 2017), counterfactual risk minimization for learning from logged bandit feedback (Swaminathan & Joachims, 2015; London & Sandler, 2019), robustness to adversarial attacks (Schmidt et al., 2018; Wong & Kolter, 2018), differentially private learning (Dwork et al., 2006, Chaudhuri et al., 2011), and algorithms that ensure fairness (Dwork et al., 2012).

This one-day workshop will bring together researchers in both theoretical and applied machine learning, across areas such as statistical learning theory, adversarial learning, fairness and privacy, to discuss the problem of obtaining performance guarantees and algorithms to optimize them. The program will include invited and contributed talks, poster sessions and a panel discussion. We particularly welcome contributions describing fundamentally new problems, novel learning principles, creative bound optimization techniques, and empirical studies of theoretical findings.

Sat 8:45 a.m. - 9:00 a.m. [iCal]
Welcome Address (Talk)
Ben London
Sat 9:00 a.m. - 9:45 a.m. [iCal]

I will discuss some recent results on designing explicit regularizers to improve the generalization performances of deep neural networks. We derive data-dependent generalization bounds for deep neural networks. We empirically regularize the bounds and obtain improved generalization performance (in terms of the standard accuracy or the robust accuracy). I will also touch on recent results on applying these techniques to imbalanced datasets.

Based on joint work with Colin Wei, Kaidi Cao, Adrien Gaidon, and Nikos Arechiga

https://arxiv.org/abs/1910.04284 https://arxiv.org/abs/1906.07413 https://arxiv.org/abs/1905.03684

Tengyu Ma
Sat 9:45 a.m. - 10:15 a.m. [iCal]
Given data drawn from an unknown distribution, $D$, to what extent is it possible to ``amplify'' this dataset and faithfully output a larger set of samples that appear to have been drawn from $D$? We formalize this question as follows: an $(n,m)$ amplification procedure takes as input $n$ independent draws from an unknown distribution $D$, and outputs a set of $m > n$ ``samples'' which must be indistinguishable from $m$ samples drawn i.i.d. from $D$. We consider this sample amplification problem in two fundamental settings: the case where $D$ is an arbitrary discrete distribution supported on $k$ elements, and the case where $D$ is a $d$-dimensional Gaussian with unknown mean, and fixed covariance matrix. Perhaps surprisingly, we show a valid amplification procedure exists for both of these settings, even in the regime where the size of the input dataset, $n$, is significantly less than what would be necessary to learn distribution $D$ to non-trivial accuracy. We also show that our procedures are optimal up to constant factors. Beyond these results, we also formalize a number of curious directions for future research along this vein.
Vatsal Sharan
Sat 10:15 a.m. - 10:45 a.m. [iCal]

Visit https://sites.google.com/view/mlwithguarantees/accepted-papers for the list of papers.

Posters will be up all day.

Antonia Marcu, Yao-Yuan Yang, Pascale Gourdeau, Chen Zhu, Thodoris Lykouris, Jianfeng Chi, Mark Kozdoba, Arjun Nitin Bhagoji, Xiaoxia (Shirley) Wu, Jay Nandy, Michael T Smith, Bingyang Wen, Yuege (Gail) Xie, Konstantinos Pitas, Suprosanna Shit, Maksym Andriushchenko, Dingli Yu, Gaël Letarte, Misha Khodak, Hussein Mozannar, Chara Podimata, James Foulds, Yizhen Wang, Huishuai Zhang, Ondrej Kuzelka, Alexander Levine, Nan Lu, Zakaria Mhammedi, Paul Viallard, Diana Cai, Lovedeep Gondara, James Lucas, Yasaman Mahdaviyeh, Aristide Baratin, Rishi Bommasani, Alessandro Barp, Andrew Ilyas, Kaiwen Wu, Jens Behrmann, Omar Rivasplata, Amir Nazemi, Aditi Raghunathan, William Stephenson, Sahil Singla, Akhil Gupta, YooJung Choi, Yannic Kilcher, Clare Lyle, Edoardo Manino, Andrew Bennett, Zhi Xu, Niladri Chatterji, Emre Barut, Flavien Prost, Rodrigo Toro Icarte, Arno Blaas, Charlie Yun, Sahin Lale, YiDing Jiang, Tharun Medini, Ashkan Rezaei, Alexander Meinke, Stephen Mell, Gary Kazantsev, Shivam Garg, Anu Sinha, Vishnu Lokhande, Geovani Rizk, Han Zhao, Aditya Kumar Akash, Jikai Hou, Ali Ghodsi, Matthias Hein, Tyler Sypherd, Yichen Yang, Anastasia Pentina, Pierre Gillot, Antoine Ledent, Guy Gur-Ari, Noah MacAulay, Tianzong Zhang
Sat 10:45 a.m. - 11:30 a.m. [iCal]
Mehryar Mohri, "Learning with Sample-Dependent Hypothesis Sets" (Invited Talk)
Mehryar Mohri
Sat 11:30 a.m. - 12:00 p.m. [iCal]

Machine learning models have traditionally been developed under the assumption that the training and test distributions match exactly. However, recent success in few-shot learning and related problems are encouraging signs that these models can be adapted to more realistic settings where train and test differ. Unfortunately, there is severely limited theoretical support for these algorithms and little is known about the difficulty of these problems. In this work, we provide novel information-theoretic lower-bounds on minimax rates of convergence for algorithms which are trained on data from multiple sources and tested on novel data. Our bounds depend intuitively on the information shared between sources of data and characterizes the difficulty of learning in this setting for arbitrary algorithms.

James Lucas
Sat 12:00 p.m. - 1:45 p.m. [iCal]
Lunch Break
Sat 1:45 p.m. - 2:30 p.m. [iCal]

While neural networks have achieved high performance in different learning tasks, their accuracy drops significantly in the presence of small adversarial perturbations to inputs. In the last couple of years, several practical defenses based on regularization and adversarial training have been proposed which are often followed by stronger attacks to defeat them. To escape this cycle, a new line of work focuses on developing certifiably robust classifiers. In these models, for a given input sample, one can calculate a robustness certificate such that for ‘any’ perturbation of the input within the robustness radius, the classification output will ‘provably’ remain unchanged. In this talk, I will present two certifiable defenses: (1) Wasserstein smoothing to defend against non-additive Wasserstein adversarial attacks, and (2) Curvature-based robust training to certifiably defend against L2 attacks by globally bounding curvature values of the network.

This is a joint work with Alex Levine and Sahil Singla.

Soheil Feizi
Sat 2:30 p.m. - 3:00 p.m. [iCal]
The problem of adversarial robustness has been studied extensively for neural networks. However, for boosted decision trees and decision stumps there are almost no results, even though they are widely used in practice (e.g. XGBoost) due to their accuracy, interpretability, and efficiency. We show in this paper that for boosted decision stumps the \textit{exact} min-max robust loss and test error for an $l_\infty$-attack can be computed in $O(T\log T)$ time per input, where $T$ is the number of decision stumps and the optimal update step of the ensemble can be done in $O(n^2\,T\log T)$, where $n$ is the number of data points. For boosted trees we show how to efficiently calculate and optimize an upper bound on the robust loss, which leads to state-of-the-art robust test error for boosted trees on MNIST (12.5\% for $\epsilon_\infty=0.3$), FMNIST (23.2\% for $\epsilon_\infty=0.1$), and CIFAR-10 (74.7\% for $\epsilon_\infty=8/255$). Moreover, the robust test error rates we achieve are competitive to the ones of provably robust CNNs. Code of our method is available at \url{https://git.io/Je18r}. This is a short version of the corresponding NeurIPS 2019 paper \cite{andriushchenko2019provably}.
Maksym Andriushchenko
Sat 3:00 p.m. - 3:30 p.m. [iCal]

Visit https://sites.google.com/view/mlwithguarantees/accepted-papers for the list of papers.

Posters will be up all day.

Sat 3:30 p.m. - 4:15 p.m. [iCal]
Aaron Roth, "Average Individual Fairness" (Invited Talk)
Aaron Roth
Sat 4:15 p.m. - 4:45 p.m. [iCal]

We study learning non-discriminatory predictors when the protected attributes are privatized or noisy. We observe that, in the population limit, non-discrimination against noisy attributes is equivalent to that against original attributes. We show this to hold for various fairness criteria. We then characterize the amount of difficulty, in sample complexity, that privacy adds to testing non-discrimination. Using this relationship, we propose how to carefully adapt existing non-discriminatory learners to work with privatized protected attributes. Care is crucial, as naively using these learners may create the illusion of non-discrimination, while continuing to be highly discriminatory.

Hussein Mozannar
Sat 4:45 p.m. - 5:30 p.m. [iCal]
Emma Brünskill, "Some Theory RL Challenges Inspired by Education" (Invited Talk)
Emma Brunskill
Sat 5:30 p.m. - 6:00 p.m. [iCal]
Discussion Panel

Author Information

Ben London (Amazon)
Gintare Karolina Dziugaite (Element AI)
Dan Roy (Univ of Toronto & Vector)
Thorsten Joachims (Cornell)
Aleksander Madry (MIT)

Aleksander Madry is the NBX Associate Professor of Computer Science in the MIT EECS Department and a principal investigator in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). He received his PhD from MIT in 2011 and, prior to joining the MIT faculty, he spent some time at Microsoft Research New England and on the faculty of EPFL. Aleksander's research interests span algorithms, continuous optimization, science of deep learning and understanding machine learning from a robustness perspective. His work has been recognized with a number of awards, including an NSF CAREER Award, an Alfred P. Sloan Research Fellowship, an ACM Doctoral Dissertation Award Honorable Mention, and 2018 Presburger Award.

John Shawe-Taylor (UCL)

More from the Same Authors