Timezone: »

Workshop
Machine Learning Meets Econometrics (MLECON)
David Bruns-Smith · Arthur Gretton · Limor Gultchin · Niki Kilbertus · Krikamol Muandet · Evan Munro · Angela Zhou

Mon Dec 13 05:00 AM -- 12:10 PM (PST) @ None

The Machine Learning Meets Econometrics (MLECON) workshop will serve as an interface for researchers from machine learning and econometrics to understand challenges and recognize opportunities that arise from the synergy between these two disciplines as well as to exchange new ideas that will help propel the fields. Our one-day workshop will consist of invited talks from world-renowned experts, shorter talks from contributed authors, a Gather.Town poster session, and an interdisciplinary panel discussion. To encourage cross-over discussion among those publishing in different venues, the topic of our panel discussion will be “Machine Learning in Social Systems: Challenges and Opportunities from Program Evaluation”. It was designed to highlight the complexity of evaluating social and economic programs as well as shortcomings of current approaches in machine learning and opportunities for methodological innovation. These challenges include more complex environments (markets, equilibrium, temporal considerations) and behavior (heterogeneity, delayed effects, unobserved confounders, strategic response). Our team of organizers and program committees is diverse in terms of gender, race, affiliations, country of origin, disciplinary background, and seniority levels. We aim to convene a broad variety of viewpoints on methodological axes (nonparametrics, machine learning, econometrics) as well as areas of application. Our invited speakers and panelists are leading experts in their respective fields and span far beyond the core NeurIPS community. Lastly, we expect participants with diverse backgrounds from various sub-communities of machine learning and econometrics (e.g., non- and semi-parametric econometrics, applied econometrics, reinforcement learning, kernel methods, deep learning, micro- and macro-economics) among other related communities.

 Mon 5:00 a.m. - 5:10 a.m. Welcome and Introduction (Introduction) 🔗 Mon 5:10 a.m. - 5:30 a.m. Invited talk #1 (Invited talk) Elizabeth A. Stuart 🔗 Mon 5:30 a.m. - 5:50 a.m. Invited talk #2 (Invited talk) Vira Semenova 🔗 Mon 5:50 a.m. - 6:10 a.m. Coffee Break (Break)  link » 🔗 Mon 6:10 a.m. - 6:20 a.m. Contributed talks Session 1 (Contributed talk) Jonathan Roth 🔗 Mon 6:20 a.m. - 6:30 a.m. Contributed talks Session 2 (Contributed talk) Michel Besserve 🔗 Mon 6:30 a.m. - 6:40 a.m. Break  link » 🔗 Mon 6:40 a.m. - 7:00 a.m. Invited talk #3 (Invited talk) Eric Tchetgen Tchetgen 🔗 Mon 7:00 a.m. - 7:20 a.m. Invited talk #4 (Invited talk) Xiaohong Chen 🔗 Mon 7:20 a.m. - 7:45 a.m. Coffee Break (Break)  link » 🔗 Mon 7:45 a.m. - 7:55 a.m. Zoom Q&A for Invited Talk #1 and #2 (Discussion) Elizabeth A. Stuart · Vira Semenova 🔗 Mon 7:55 a.m. - 8:05 a.m. Zoom Q&A for Contributed talks Session 1+2 (Discussion) Jonathan Roth · Michel Besserve 🔗 Mon 8:05 a.m. - 8:15 a.m. Zoom Q&A for Invited Talks #3 and #4 (Discussion) Xiaohong Chen · Eric Tchetgen Tchetgen 🔗 Mon 8:15 a.m. - 8:20 a.m. Coffee Break (Break)  link » 🔗 Mon 8:20 a.m. - 9:20 a.m. Poster Session 1 (Poster session)  link » 🔗 Mon 9:20 a.m. - 9:30 a.m. Break  link » 🔗 Mon 9:30 a.m. - 9:40 a.m. Contributed talks Session 3 (Contributed talk) Dhruv Rohatgi 🔗 Mon 9:40 a.m. - 9:50 a.m. Zoom Q&A for Contributed talks Session 3 (Discussion) Dhruv Rohatgi 🔗 Mon 9:50 a.m. - 10:00 a.m. Break  link » 🔗 Mon 10:00 a.m. - 11:00 a.m. Panel Discussion: “Machine Learning in Social Systems: Challenges and Opportunities from Program Evaluation” (Discussion) Jennifer Hill · guido imbens · Vasilis Syrgkanis 🔗 Mon 11:00 a.m. - 12:00 p.m. Poster Session 2 (Poster session)  link » 🔗 Mon 12:00 p.m. - 12:10 p.m. Wrapup (Introduction) 🔗 - Off-Policy Evaluation with General Logging Policies (Poster)  link » Off-policy evaluation (OPE) attempts to predict the performance of counterfactual policies using log data from a different policy. We expand its applicability by developing an OPE method for a class of stochastic and deterministic logging policies. This class includes deterministic bandit (such as Upper Confidence Bound) as well as deterministic decision-making based on supervised and unsupervised learning. We prove that our method's prediction converges in probability to the true performance of a counterfactual policy as the sample size increases. We validate our method with experiments on partly and entirely deterministic logging policies. Finally, we apply it to evaluate coupon targeting policies by a major online platform and show how to improve the existing policy. Link » Kyohei Okumura 🔗 - Adaptive maximization of social welfare (Poster)  link » We consider the problem of repeatedly choosing policy parameters in order to maximize social welfare, the weighted sum of utility. The outcomes of earlier choices inform later choices. In contrast to multi-armed bandit models, utility is not observed, but needs to be indirectly inferred as equivalent variation. In contrast to standard optimal tax theory, response functions need to be learned through policy choices. We propose an algorithm based on optimal tax theory, Gaussian process priors, random Fourier features, and Thompson sampling, to (approximately) maximize social welfare over time. Link » Maximilian Kasy 🔗 - Estimation and Inference of Semiparametric Single-Index Models with High-Dimensional Covariates (Poster)  link » This paper develops new estimation and inference methods of high-dimensional single-index models. We propose a simple two-stage estimation method based on the average derivative estimator (ADE). This ADE is composed of weighted score functions of covariates that can easily be estimated under a semiparametric Gaussian copula structure. In the fi rst stage, we plug in standard nonparametric estimates for marginal features and a regularized estimator for the precision matrix of the Gaussian copula to obtain high-dimensional score functions. In the second stage, we conduct LASSO-type thresholding to get sparse estimates of the regression coefficients in single-index models. Both stages involve only convex minimization problems. We derive the non-asymptotic bound of our estimator. For inference, we prove the asymptotic normality of a de-biased estimator using the one-step Newton-Raphson update. Our inferential tools do not rely on the Gaussian copula restriction and are more generally applicable with other pilot estimators. Link » RUIXUAN LIU 🔗 - On Parameter Estimation in Unobserved Components Models subject to Linear Inequality Constraints (Poster)  link » We propose a new quadratic-programming-based method of approximating a nonstandard density using a multivariate Gaussian density. Such nonstandard densities usually arise while developing posterior samplers for unobserved components models involving inequality constraints on the parameters. For instance, Chan et al. (2016) propose a new model of trend inflation with linear inequality constraints on the stochastic trend. We implement the proposed new method for this model and compare it to the existing approximation. We observe that the proposed new method works as good as the existing approximation in terms of the final trend estimates while achieving greater gains in terms of sample efficiency. Link » Abhishek Kumar Umrawal 🔗 - Learning Causal Relationships from Conditional Moment Restrictions by Importance Weighting (Poster)  link » We consider learning causal relationships under conditional moment restrictions. Unlike causal inference under unconditional moment restrictions, conditional moment restrictions pose serious challenges for causal inference, especially in complex, high-dimensional settings. To address this issue, we propose a method that transforms conditional moment restrictions to unconditional moment restrictions through importance weighting using the conditional density ratio estimator. Then, using this transformation, we propose a method that successfully estimate a parametric or nonparametric functions defined under the conditional moment restrictions. In experiments, we confirm the soundness of our proposed method. Link » Shota Yasui 🔗 - Deep Vector Autoregression for Macroeconomic Data (Poster)  link » Vector Autoregression (VAR) models are a popular choice for forecasting time series data. Due to their simplicity and success at modelling monetary economic indicators VARs have become a standard tool for central bankers to construct economic forecasts. Impulse response functions can be readily retrieved from the conventional VAR and used for inference purposes. They are typically employed to investigate various interactions between variables that form part of the monetary transmission mechanism. A crucial assumption underlying conventional VARs is that these interactions between variables through time can be modelled linearly. We propose a novel approach towards VARs that relaxes this assumption. In particular, we offer a simple way to integrate deep learning into VARs without deviating too much from the trusted and established framework. By fitting each equation of the VAR system with a deep neural network, the Deep VAR outperforms its conventional benchmark in terms of in-sample fit, out-of-sample fit and point forecasting accuracy. In particular, we find that the Deep VAR is able to better capture the structural economic changes during periods of uncertainty and recession. Link » Patrick Altmeyer 🔗 - Safe Online Bid Optimization with Uncertain Return-On-Investment and Budget Constraints (Poster)  link » In online advertising, the advertiser's goal is usually a tradeoff between achieving high volumes and high profitability. The companies' business units customarily address this tradeoff by maximizing the volumes while guaranteeing a minimum Return On Investment (ROI). This paper investigates combinatorial bandit algorithms for the bid optimization of advertising campaigns subject to uncertain budget and ROI constraints. We show that the problem is inapproximable within any factor unless $P = NP$ even without uncertainty, and we provide a pseudo-polynomial-time algorithm that achieves an optimal solution. Furthermore, we show that no online learning algorithm can violate the (budget or ROI) constraints during the learning process a sublinear number of times while guaranteeing a sublinear pseudo-regret. We provide the $GCB_{safe}$ algorithm guaranteeing w.h.p.~a constant upper bound on the number of constraints violations at the cost of a linear pseudo-regret bound. However, a simple adaptation of $GCB_{safe}$ provides a sublinear pseudo-regret when accepting the satisfaction of the constraints with a fixed tolerance. Finally, we experimentally evaluate $GCB_{safe}$ in terms of pseudo-regret/constraint-violation tradeoff in settings generated from real-world data. Link » Giulia Romano 🔗 - Causal Gradient Boosting: Boosted Instrumental Variable Regression (Poster)  link » Recent advances in the literature have demonstrated that standard supervised learning algorithms are ill-suited for problems with endogenous explanatory variables. To correct for this, many variants of nonparameteric instrumental variable regression methods have been developed. In this paper, we propose an alternative algorithm called boostIV that builds on the traditional gradient boosting algorithm and corrects for the endogeneity bias. The algorithm is very intuitive and resembles an iterative version of the standard 2SLS estimator. The proposed estimator is data driven and does not require any functional form approximation assumptions besides specifying a weak learner. We demonstrate that our estimator is consistent under mild conditions. We carry out extensive Monte Carlo simulations to demonstrate the finite sample performance of our algorithm compared to other recently developed methods. We show that boostIV is at worst on par with the existing methods and on average significantly outperforms them. Link » Edvard Bakhitov 🔗 - A Bayesian take on option pricing with Gaussian processes (Poster)  link » Local volatility is a versatile option pricing model due to its state dependent diffusion coefficient. Calibration is, however, non-trivial as it involves both proposing a hypothesis model of the latent function and a method for fitting it to data. In this paper we present novel Bayesian inference with Gaussian process priors. We obtain a rich representation of the local volatility function with a probabilistic notion of uncertainty attached to the calibrate. We propose an inference algorithm and apply our approach to market data. Link » Martin Tegnér · Martin Tegnér 🔗 - Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy (Poster)  link » Even the most carefully curated economic data sets have variables that are noisy, missing, discretized, or privatized. The standard workflow for empirical research involves data cleaning followed by data analysis that typically ignores the bias and variance consequences of data cleaning. We formulate a semiparametric model for causal inference with corrupted data to encompass both data cleaning and data analysis. We propose a new end-to-end procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals. We prove root-n consistency, Gaussian approximation, and semiparametric efficiency for our estimator of the causal parameter by finite sample arguments. Our key assumption is that the true covariates are approximately low rank. In our analysis, we provide nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics. We verify the coverage of the data cleaning-adjusted confidence intervals in simulations. Link » Rahul Singh 🔗 - Quasi-Bayesian Dual Instrumental Variable Regression (Poster)  link » Recent years have witnessed an upsurge of interest in employing flexible machine learning models for instrumental variable (IV) regression, but development of uncertainty quantification methodology is still lacking. In this work we present a novel quasi-Bayesian procedure for IV regression, building upon the recently developed kernelized IV models and the dual/minimax formulation of IV regression. We analyze the frequentist behavior of the proposed quasi-posterior, establishing minimax contraction rates in $L_2$ and Sobolev norms, and showing that the radii of its credible balls have the correct order of magnitude. We derive a scalable approximate inference algorithm, which has time cost comparable to the corresponding point estimation method, and can be further extended to work with neural network models. Empirical evaluation shows that our method produces informative uncertainty estimates on complex high-dimensional problems. Link » Ziyu Wang 🔗 - How informative is the Order Book Beyond the Best Levels? Machine Learning Perspective (Poster)  link » Research on limit order book markets has been rapidly growing and nowadays high-frequency full order book data is widely available for researchers and practitioners. However, it is common that research papers use the best level data only, which motivates us to ask whether the exclusion of the quotes deeper in the book over multiple price levels causes performance degradation. In this paper, we address this question by using modern Machine Learning (ML) techniques to predict mid-price movements without assuming that limit order book markets represent a linear system. We provide a number of results that are robust across ML prediction models, feature selection algorithms, data sets, and prediction horizons. We find that the best bid and ask levels are systematically identified not only as the most informative levels in the order books, but also to carry most of the information needed for good prediction performance. On the other hand, even if the top-of-the-book levels contain most of the relevant information, to maximize models' performance one should use all data across all the levels. Additionally, the informativeness of the order book levels clearly decreases from the first to the fourth level while the rest of the levels are approximately equally important. Link » Dat T Tran 🔗 - Inference of Heterogeneous Treatment Effects Using Observational Data with High-Dimensional Covariates (Poster)  link » The present work focuses on heterogeneous treatment effects using observational data with high-dimensional covariates and endogeneity. Novel estimation and inference methods are developed for treatment-covariate interaction effects and covariate-specific treatment effects with the help of an instrumental variable to deal with the endogeneity. The covariate-specific treatment effects represent the expected difference between potential outcomes given a set of covariates. The instrument induces exogeneity between the treatment and the potential outcomes given the covariates under the complier'' subgroup of the population. Under the framework of generalized linear models (GLMs), this study proposes regularized estimation for each regression coefficient under a non-convex objective function. Based on the initial regularized estimator, a debiased estimator is proposed for the regression coefficients, which eliminates the impact of regularization bias from both first- and second-stage regressions. The asymptotic normality results are provided for both the debiased estimator and its functional. Based on these results, confidence intervals could be constructed for the treatment, the covariates of interest, their interaction effects and the covariate-specific treatment effects. The proposed method can be applied to both continuous and categorical responses, corresponding to linear and non-linear second-stage regression models, respectively. The main contributions of this work are as follows. (i) A regularized two-stage estimation procedure is proposed for models on the compliers under data endogeneity. (ii) A novel approach to simultaneously correct the biases due to regularized estimation at both stages is proposed. (iii) A novel statistical inference procedure based on the de-biased estimator is developed for covariate effects and (local) heterogeneous treatment effects with high-dimensional data. Link » Jing Tao 🔗 - Efficient Online Estimation of Causal Effects by Deciding What to Observe (Poster)  link » Researchers often face data fusion problems, where multiple data sources are available, each capturing a distinct subset of variables. While problem formulations typically take the data as given, in practice, data acquisition can be an ongoing process. In this paper, we introduce the problem of deciding, at each time, which data source to sample from. Our goal is to estimate a given functional of the parameters of a probabilistic model as efficiently as possible. We propose online moment selection (OMS), a framework in which structural assumptions are encoded as moment conditions. The optimal action at each step depends, in part, on the very moments that identify the functional of interest. Our algorithms balance exploration with choosing the best action as suggested by estimated moments. We propose two selection strategies: (1) explore-then-commit (ETC) and (2) explore-then-greedy (ETG), proving that both achieve zero asymptotic regret as assessed by MSE. We instantiate our setup for average treatment effect estimation, where structural assumptions are given by a causal graph and data sources include subsets of mediators, confounders, and instrumental variables. Link » Shantanu Gupta 🔗 - Many Proxy Controls (Poster)  link » A recent literature considers causal inference using noisy proxies for unobserved confounding factors. The proxies are divided into two sets that are independent conditional on the confounders. One set of proxies are negative control treatments' and the other are `negative control outcomes'. Existing work applies to low-dimensional settings with a fixed number of proxies and confounders. In this work we consider linear models with many proxy controls and possibly many confounders. A key insight is that if each group of proxies is strictly larger than the number of confounding factors, then a matrix of nuisance parameters has a low-rank structure and a vector of nuisance parameters has a sparse structure. We can exploit the rank-restriction and sparsity to reduce the number of free parameters to be estimated. The number of unobserved confounders is not known a priori but we show that it is identified, and we apply penalization methods to adapt to this quantity. We provide an estimator with a closed-form as well as a doubly-robust estimator that must be evaluated using numerical methods. We provide conditions under which our doubly-robust estimator is uniformly root-$n$ consistent, asymptotically centered normal, and our suggested confidence intervals have asymptotically correct coverage. We provide simulation evidence that our methods achieve better performance than existing approaches in high dimensions, particularly when the number of proxies is substantially larger than the number of confounders. Link » Ben Deaner 🔗 - Unsupervised Feature Extraction Clustering for Crisis Prediction (Poster)  link » This paper focuses on macroeconomic forecasting literature.We introduce unFEAR, an unsupervised feature extraction clustering method aimed at facilitating crisis prediction tasks. We use unsupervised representation learning and a novel autoencoder method to extract from economic data information relevant to identify time-invariant non-overlapping clusters comprising observed crisis and non-crisis episodes. Each cluster corresponds to a different economic regime characterized by an idiosyncratic crisis generating mechanism. Link » Ran Wang 🔗 - Double machine learning for sample selection models (Poster)  link » This paper considers the evaluation of discretely distributed treatments when outcomes are only observed for a subpopulation due to sample selection or outcome attrition. For identification, we combine a selection-on-observables assumption for treatment assignment with either selection-on-observables or instrumental variable assumptions concerning the outcome attrition/sample selection process. We also consider dynamic confounding, meaning that covariates that jointly affect sample selection and the outcome may (at least partly) be influenced by the treatment. To control in a data-driven way for a potentially high dimensional set of pre- and/or post-treatment covariates, we adapt the double machine learning framework for treatment evaluation to sample selection problems. We make use of (a) Neyman-orthogonal, doubly robust, and efficient score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learning-based estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that the proposed estimators are asymptotically normal and root-n consistent under specific regularity conditions concerning the machine learners and investigate their finite sample properties in a simulation study. We also apply our proposed methodology to the Job Corps data for evaluating the effect of training on hourly wages which are only observed conditional on employment. The estimator is available in the causalweight package for the statistical software R Link » Martin Huber 🔗 - Evolution of topics in central bank speech communication (Poster)  link » This paper studies the content of central bank speech communication from 1997 through 2020 and asks the following questions: (i) What global topics do central banks talk about? (ii) How do these topics evolve over time? I turn to natural language processing, and more specifically Dynamic Topic Models, to answer these questions. The analysis consists of an aggregate study of nine major central banks and a case study of the Federal Reserve, which allows for region specific control variables. I show that: (i) Central banks address a broad range of topics. (ii) The topics are well captured by Dynamic Topic Models. (iii) The global topics exhibit strong and significant autoregressive properties not easily explained by financial control variables. Link » Magnus Hansson 🔗 - Causal Matrix Completion (Poster)  link » Matrix completion is the study of recovering an underlying matrix from a sparse subset of noisy observations. Traditionally, it is assumed that the entries of the matrix are "missing completely at random" (MCAR), i.e., each entry is revealed at random, independent of everything else, with uniform probability. This is likely unrealistic due to the presence of "latent confounders", i.e., unobserved factors that determine both the entries of the underlying matrix and the missingness pattern in the observed matrix. For example, in the context of movie recommender systems -- a canonical application for matrix completion -- a user who vehemently dislikes horror films is unlikely to ever watch horror films. In general, these confounders yield "missing not at random" (MNAR) data, which can severely impact any inference procedure that does not correct for this bias. We develop a formal causal model for matrix completion through the language of potential outcomes, and provide novel identification arguments for a variety of causal estimands of interest. We design a procedure, which we call "synthetic nearest neighbors" (SNN), to estimate these causal estimands. We prove finite-sample consistency and asymptotic normality of our estimator. Our analysis also leads to new theoretical results for the matrix completion literature. In particular, we establish entry-wise, i.e., max-norm, finite-sample consistency and asymptotic normality results for matrix completion with MNAR data. As a special case, this also provides entry-wise bounds for matrix completion with MCAR data. Across simulated and real data, we demonstrate the efficacy of our proposed estimator. Link » Anish Agarwal 🔗 - Policy learning under ambiguity (Poster)  link » This paper studies the problem of estimating individualized treatment rules when treatment effects are partially identified, as it is often the case with observational data. We first study the population problem of assigning treatment under partial identification and derive the population optimal policies using classic optimality criteria for decision under ambiguity. We then propose an algorithm for computation of the estimated optimal treatment policy and provide statistical guarantees for its convergence to the population counterpart. Our estimation procedure leverages recent advances in the orthogonal machine learning literature, while our theoretical results account for the presence of non-differentiabilities in the problem. The proposed methods are illustrated using data from the Job Partnership Training Act study. Link » Riccardo D Adamo 🔗 - Modeling Worker Career Trajectories with Neural Sequence Models (Poster)  link » The quality of a job depends not only on the job itself but also on the transition opportunities and career paths it opens up. However, limited by scarce data and restrictive models, prior research on labor market transitions has focused on transitions over short periods rather than over careers. We fill this gap by extending transformer neural networks to model sequences of jobs. Sequences of jobs differ from sequences of language, for which the transformer model was initially developed, so we modify the model in two ways: we enable two-stage prediction to first predict whether an individual changes jobs before predicting a specific occupation, and we also incorporate covariates into the transformer architecture. We train our model on a dataset of 24 million American career trajectories collected from resumes posted online. The transformer, which conditions on all jobs in an individual's history, yields significant gains in predictive performance over a Markov baseline, and our modifications add substantially to this gain. We demonstrate the use-cases of our model with two applications: inferring long-term wages associated with starting in various jobs and imputing intermediate jobs between a pair of known jobs. Link » Keyon Vafa 🔗 - Deep Causal Inequalities: Demand Estimation in Differentiated Products Markets (Poster)  link » Supervised machine learning algorithms fail to perform well in the presence of endogeneity in the explanatory variables. In this paper, we borrow from literature on partial identification to propose deep causal inequalities that overcomes this issue. Instead of relying on observed labels, the DeepCI estimator uses inferred inequalities from the observed behavior of agents in the data. This by construction can allows us to circumvent the issue of endogeneous explanatory variables in many cases. We provide theoretical guarantees for our estimator and demonstrate it is consistent under very mild conditions. We demonstrate through extensive simulations that our estimator outperforms standard supervised machine learning algorithms and existing partial identification methods. Link » Amandeep Singh 🔗 - Boosting engagement in ed tech with personalized recommendations (Poster)  link » Recommendation systems are the backbone of some of the most successful companies in the world. Their fundamental feature is that they exhibit increasing gains to scale: the bigger the platform the more precise and impactful are the recommendations. Understanding how large a system needs to be and how much users' data is necessary to leverage the gains from personalized recommendations is key to deciding when to launch such a system; yet, there is a shortage of empirical evidence to guide this decision. The most prominent applications of recommendation systems are associated with the entertainment sector (e.g. Netflix, Spotify, YouTube) or online retail (e.g. Amazon or eBay); it is unclear how effective these systems are in other contexts. This paper aims to fill these gaps by carrying out an RCT-based analysis of the introduction of personalized recommendations into an ed-tech platform. Link » Ayush Kanodia 🔗 - Robust Algorithms for GMM Estimation: A Finite Sample Viewpoint (Poster)  link » For many inference problems in statistics and econometrics, the unknown parameter is identified by a set of moment conditions. A generic method of solving moment conditions is the Generalized Method of Moments (GMM). However, classical GMM estimation is potentially very sensitive to outliers. Robustified GMM estimators have been developed in the past, but suffer from several drawbacks: computational intractability, poor dimension-dependence, and no quantitative recovery guarantees in the presence of a constant fraction of outliers. In this work, we develop the first computationally efficient GMM estimator (under intuitive assumptions) that can tolerate a constant $\epsilon$ fraction of adversarially corrupted samples, and that has an $\ell_2$ recovery guarantee of $O(\sqrt{\epsilon})$. To achieve this, we draw upon and extend a recent line of work on algorithmic robust statistics for related but simpler problems such as mean estimation, linear regression and stochastic optimization. As two examples of the generality of our algorithm, we show how our estimation algorithm and assumptions apply to instrumental variables linear and logistic regression. Moreover, we experimentally validate that our estimator outperforms classical IV regression and two-stage Huber regression on synthetic and semi-synthetic datasets with corruption. Link » Dhruv Rohatgi 🔗 - An Outcome Test of Discrimination for Ranked Lists (Poster)  link » This paper extends Becker (1957)'s outcome test of discrimination to settings where a (human or algorithmic) decision-maker produces a ranked list of candidates. Ranked lists are particularly relevant in the context of online platforms that produce search results or feeds, and also arise when human decisionmakers express ordinal preferences over a list of candidates. We show that non-discrimination implies a system of moment inequalities, which intuitively impose that one cannot permute the position of a lower-ranked candidate from one group with a higher-ranked candidate from a second group and systematically improve the objective. Moreover, we show that that these moment inequalities are the only testable implications of non-discrimination when the auditor observes only outcomes and group membership by rank. We show how to statistically test the implied inequalities, and demonstrate our approach in an application using data from LinkedIn. Link » Jonathan Roth 🔗 - Optimal design of interventions in complex socio-economic systems (Poster)  link » Complex systems often contain feedback loops, that can be described as cyclic causal models. Contrary to acyclic graphs, intervening in cyclic graphs may lead to counterproductive effects, which cannot be inferred directly from the graph structure. After establishing a framework for differentiable interventions based on Lie groups, we take advantage of modern automatic differentiation techniques and their application to implicit functions in order to optimize interventions in cyclic causal models. We illustrate this framework by investigating the scenarios of transition to sustainable economies. Link » Michel Besserve 🔗

#### Author Information

##### Arthur Gretton (Gatsby Unit, UCL)

Arthur Gretton is a Professor with the Gatsby Computational Neuroscience Unit at UCL. He received degrees in Physics and Systems Engineering from the Australian National University, and a PhD with Microsoft Research and the Signal Processing and Communications Laboratory at the University of Cambridge. He previously worked at the MPI for Biological Cybernetics, and at the Machine Learning Department, Carnegie Mellon University. Arthur's recent research interests in machine learning include the design and training of generative models, both implicit (e.g. GANs) and explicit (high/infinite dimensional exponential family models), nonparametric hypothesis testing, and kernel methods. He has been an associate editor at IEEE Transactions on Pattern Analysis and Machine Intelligence from 2009 to 2013, an Action Editor for JMLR since April 2013, an Area Chair for NeurIPS in 2008 and 2009, a Senior Area Chair for NeurIPS in 2018, an Area Chair for ICML in 2011 and 2012, and a member of the COLT Program Committee in 2013. Arthur was program chair for AISTATS in 2016 (with Christian Robert), tutorials chair for ICML 2018 (with Ruslan Salakhutdinov), workshops chair for ICML 2019 (with Honglak Lee), program chair for the Dali workshop in 2019 (with Krikamol Muandet and Shakir Mohammed), and co-organsier of the Machine Learning Summer School 2019 in London (with Marc Deisenroth).