Track: Track 2 Session 1

Tue 10 Dec. 10:05 - 10:20 PST

Oral

Logarithmic Regret for Online Control

Naman Agarwal · Elad Hazan · Karan Singh

We study optimal regret bounds for control in linear dynamical systems under adversarially changing strongly convex cost functions, given the knowledge of transition dynamics. This includes several well studied and influential frameworks such as the Kalman filter and the linear quadratic regulator. State of the art methods achieve regret which scales as T^0.5, where T is the time horizon.

We show that the optimal regret in this fundamental setting can be significantly smaller, scaling as polylog(T). This regret bound is achieved by two different efficient iterative methods, online gradient descent and online natural gradient.

Tue 10 Dec. 10:20 - 10:25 PST

Spotlight

List-decodable Linear Regression

Sushrut Karmalkar · Adam Klivans · Pravesh Kothari

We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than 1/2 fraction of examples.

For any \alpha < 1, our algorithm takes as input a sample {(xi,yi)}{i \leq n} of n linear equations where \alpha n of the equations satisfy yi = \langle x_i,\ell^\rangle +\zeta for some small noise \zeta and (1-\alpha) n of the equations are {\em arbitrarily} chosen. It outputs a list L of size O(1/\alpha) - a fixed constant - that contains an \ell that is close to \ell^.

Our algorithm succeeds whenever the inliers are chosen from a certifiably anti-concentrated distribution D. In particular, this gives a (d/\alpha)^{O(1/\alpha^8)} time algorithm to find a O(1/\alpha) size list when the inlier distribution is a standard Gaussian. For discrete product distributions that are anti-concentrated only in regular directions, we give an algorithm that achieves similar guarantee under the promise that \ell^* has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary.

To solve the problem we introduce a new framework for list-decodable learning that strengthens the ``identifiability to algorithms'' paradigm based on the sum-of-squares method.

Tue 10 Dec. 10:25 - 10:30 PST

Spotlight

On the Hardness of Robust Classification

Pascale Gourdeau · Varun Kanade · Marta Kwiatkowska · James Worrell

It is becoming increasingly important to understand the vulnerability of machine learning models to adversarial attacks. In this paper we study the feasibility of robust learning from the perspective of computational learning theory, considering both sample and computational complexity. In particular, our definition of robust learnability requires polynomial sample complexity. We start with two negative results. We show that no non-trivial concept class can be robustly learned in the distribution-free setting against an adversary who can perturb just a single input bit. We show moreover that the class of monotone conjunctions cannot be robustly learned under the uniform distribution against an adversary who can perturb $\omega(\log n)$ input bits. However if the adversary is restricted to perturbing $O(\log n)$ bits, then the class of monotone conjunctions can be robustly learned with respect to a general class of distributions (that includes the uniform distribution). Finally, we provide a simple proof of the computational hardness of robust learning on the boolean hypercube. Unlike previous results of this nature, our result does not rely on another computational model (e.g. the statistical query model) nor on any hardness assumption other than the existence of a hard learning problem in the PAC framework.

Tue 10 Dec. 10:30 - 10:35 PST

Spotlight

Adversarial Examples Are Not Bugs, They Are Features

Andrew Ilyas · Shibani Santurkar · Dimitris Tsipras · Logan Engstrom · Brandon Tran · Aleksander Madry

Adversarial examples have attracted significant attention in machine learning, but the reasons for their existence and pervasiveness remain unclear. We demonstrate that adversarial examples can be directly attributed to the presence of non-robust features: features (derived from patterns in the data distribution) that are highly predictive, yet brittle and (thus) incomprehensible to humans. After capturing these features within a theoretical framework, we establish their widespread existence in standard datasets. Finally, we present a simple setting where we can rigorously tie the phenomena we observe in practice to a {\em misalignment} between the (human-specified) notion of robustness and the inherent geometry of the data.

Tue 10 Dec. 10:35 - 10:40 PST

Spotlight

Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness

Saeed Mahloujifar · Xiao Zhang · Mohammad Mahmoody · David Evans

Many recent works have shown that adversarial examples that fool classifiers can be found by minimally perturbing a normal input. Recent theoretical results, starting with Gilmer et al. (2018b), show that if the inputs are drawn from a concentrated metric probability space, then adversarial examples with small perturbation are inevitable. A concentrated space has the property that any subset with Ω(1) (e.g.,1/100) measure, according to the imposed distribution, has small distance to almost all (e.g., 99/100) of the points in the space. It is not clear, however, whether these theoretical results apply to actual distributions such as images. This paper presents a method for empirically measuring and bounding the concentration of a concrete dataset which is proven to converge to the actual concentration. We use it to empirically estimate the intrinsic robustness to and L2 and Linfinity perturbations of several image classification benchmarks. Code for our experiments is available at https://github.com/xiaozhanguva/Measure-Concentration.

Tue 10 Dec. 10:40 - 10:45 PST

Spotlight

Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection

Yihe Dong · Samuel Hopkins · Jerry Li

We study two problems in high-dimensional robust statistics: \emph{robust mean estimation} and \emph{outlier detection}. In robust mean estimation the goal is to estimate the mean $\mu$ of a distribution on $\mathbb{R}^d$ given $n$ independent samples, an $\epsilon$-fraction of which have been corrupted by a malicious adversary. In outlier detection the goal is to assign an \emph{outlier score} to each element of a data set such that elements more likely to be outliers are assigned higher scores. Our algorithms for both problems are based on a new outlier scoring method we call QUE-scoring based on \emph{quantum entropy regularization}. For robust mean estimation, this yields the first algorithm with optimal error rates and nearly-linear running time $\tilde{O}(nd)$ in all parameters, improving on the previous fastest running time $\tilde{O}(\min(nd/\e^6, nd^2))$. For outlier detection, we evaluate the performance of QUE-scoring via extensive experiments on synthetic and real data, and demonstrate that it often performs better than previously proposed algorithms.