Skip to yearly menu bar Skip to main content


Poster

Euler-Lagrange Analysis of Generative Adversarial Networks

Siddarth Asokan · Chandra Seelamantula
Dec 12, 8:45 AM - 10:45 AM Great Hall & Hall B1+B2 (level 1)

We consider Generative Adversarial Networks (GANs) and address the underlying functional optimization problem ab initio within a variational setting. Strictly speaking, the optimization of the generator and discriminator functions must be carried out in accordance with the Euler-Lagrange conditions, which become particularly relevant in scenarios where the optimization cost involves regularizers comprising the derivatives of these functions. Considering Wasserstein GANs (WGANs) with a gradient-norm penalty, we show that the optimal discriminator is the solution to a Poisson differential equation. In principle, the optimal discriminator can be obtained in closed form without having to train a neural network. We illustrate this by employing a Fourier-series approximation to solve the Poisson differential equation. Experimental results based on synthesized Gaussian data demonstrate superior convergence behavior of the proposed approach in comparison with the baseline WGAN variants that employ weight-clipping, gradient or Lipschitz penalties on the discriminator on low-dimensional data. We also analyze the truncation error of the Fourier-series approximation and the estimation error of the Fourier coefficients in a high-dimensional setting. We demonstrate applications to real-world images considering latent-space prior matching in Wasserstein autoencoders and present performance comparisons on benchmark datasets such as MNIST, SVHN, CelebA, CIFAR-10, and Ukiyo-E. We demonstrate that the proposed approach achieves comparable reconstruction error and Frechet inception distance with faster convergence and up to two-fold improvement in image sharpness.

Show more
View full details
Poster

Graph Clustering with Graph Neural Networks

Anton Tsitsulin · John Palowitch · Bryan Perozzi · Emmanuel Müller
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Graph Neural Networks (GNNs) have achieved state-of-the-art results on many graph analysis tasks such as node classification and link prediction. However, important unsupervised problems on graphs, such as graph clustering, have proved more resistant to advances in GNNs. Graph clustering has the same overall goal as node pooling in GNNs—does this mean that GNN pooling methods do a good job at clustering graphs? Surprisingly, the answer is no—current GNN pooling methods often fail to recover the cluster structure in cases where simple baselines, such as k-means applied on learned representations, work well. We investigate further by carefully designing a set of experiments to study different signal-to-noise scenarios both in graph structure and attribute data. To address these methods' poor performance in clustering, we introduce Deep Modularity Networks (DMoN), an unsupervised pooling method inspired by the modularity measure of clustering quality, and show how it tackles recovery of the challenging clustering structure of real-world graphs. Similarly, on real-world data, we show that DMoN produces high quality clusters which correlate strongly with ground truth labels, achieving state-of-the-art results with over 40% improvement over other pooling methods across different metrics.

Show more
View full details
Poster

[Re] Masked Autoencoders Are Small Scale Vision Learners: A Reproduction Under Resource Constraints

Athanasios Charisoudis · Simon Ekman von Huth · Emil Jansson
Dec 12, 8:45 AM - 10:45 AM Great Hall & Hall B1+B2 (level 1)

Scope of Reproducibility — The Masked Autoencoder (MAE) was recently proposed as aframework for efficient self‐supervised pre‐training in Computer Vision [1]. In this pa‐per, we attempt a replication of the MAE under significant computational constraints.Specifically, we target the claim that masking out a large part of the input image yieldsa nontrivial and meaningful self‐supervisory task, which allows training models thatgeneralize well. We also present the Semantic Masked Autoencoder (SMAE), a novel yetsimple extension of MAE which uses perceptual loss to improve encoder embeddings.Methodology — The datasets and backbones we rely on are significantly smaller than thoseused by [1]. Our main experiments are performed on Tiny ImageNet (TIN) [2] and trans‐fer learning is performed on a low‐resolution version of CUB‐200‐2011 [3]. We use aViT‐Lite [4] as backbone. We also compare the MAE to DINO, an alternative frame‐work for self‐supervised learning [5]. The ViT, MAE, as well as perceptual loss wereimplemented from scratch, without consulting the original authors’ code. Our code isavailable at https://github.com/MLReproHub/SMAE. The computational budget for ourreproduction and extension was approximately 150 GPU hours.Results — This paper successfully reproduces the claim that the MAE poses a nontrivialand meaningful self‐supervisory task. We show that models trained with this frame‐work generalize well to new datasets and conclude that the MAE is reproducible withexception for some hyperparameter choices. We also demonstrate that MAE performswell with smaller backbones and datasets. Finally, our results suggest that the SMAEextension improves the downstream classification accuracy of the MAE on CUB (+5 pp)when coupled with an appropriate masking strategy.What was easy — Given prior experience with a deep learning framework, re‐implementingthe paper was relatively straightforward, with sufficient details given in the paper.What was difficult — We faced challenges implementing efficient patch shuffling and tun‐ing hyperparameters. The hyperparameter choices from [1] did not translate well to asmaller dataset and backbone.Communication with original authors — We have not had contact with the original authors.

Show more
View full details
Poster

Small Transformers Compute Universal Metric Embeddings

Anastasis Kratsios · Valentin Debarnot · Ivan Dokmanić
Dec 12, 8:45 AM - 10:45 AM Great Hall & Hall B1+B2 (level 1)
We study representations of data from an arbitrary metric space $\mathcal{X}$ in the space of univariate Gaussian mixtures equipped with a transport metric (Delon and Desolneux 2020). We prove embedding guarantees for feature maps implemented by small neural networks called probabilistic transformers. Our guarantees are of memorization type: we prove that a probabilistic transformer of depth about $n\log(n)$ and width about $n^2$ can bi-H\"older embed any $n$-point dataset from $\mathcal{X}$ with low metric distortion, thus avoiding the curse of dimensionality. We further derive probabilistic bi-Lipschitz guarantees, which trade off the amount of distortion and the probability that a randomly chosen pair of points embeds with that distortion. If the geometry of $\mathcal{X}$ is sufficiently regular, we obtain stronger bi-Lipschitz guarantees for all points. As applications, we derive neural embedding guarantees for datasets from Riemannian manifolds, metric trees, and certain types of combinatorial graphs. When instead embedding into multivariate Gaussian mixtures, we show that probabilistic transformers compute bi-Hölder embeddings with arbitrarily small distortion. Our results show that any finite metric dataset, from vertices on a graph to functions a function space, can be faithfully represented in a single representation space, and that the representation can be implemented by a simple transformer architecture. Thus one may only need a modular set of machine learning tools compatible with this one representation space, many of which already exist, for downstream supervised and unsupervised learning from a great variety of data types.
Show more
View full details
Poster

Toolbox for Multimodal Learn (scikit-multimodallearn)

Dominique Benielli · Baptiste Bauvin · Sokol Koço · Riikka Huusari · Cécile Capponi · Hachem Kadri · François Laviolette
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

scikit-multimodallearn is a Python library for multimodal supervised learning, licensed under Free BSD, and compatible with the well-known scikit-learn toolbox (Fabian Pedregosa, 2011). This paper details the content of the library, including a specific multimodal data formatting and classification and regression algorithms. Use cases and examples are also provided.

Show more
View full details
Poster

Variational Gibbs Inference for Statistical Model Estimation from Incomplete Data

Vaidotas Simkus · Benjamin Rhodes · Michael Gutmann
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Statistical models are central to machine learning with broad applicability across a range of downstream tasks. The models are controlled by free parameters that are typically estimated from data by maximum-likelihood estimation or approximations thereof. However, when faced with real-world data sets many of the models run into a critical issue: they are formulated in terms of fully-observed data, whereas in practice the data sets are plagued with missing data. The theory of statistical model estimation from incomplete data is conceptually similar to the estimation of latent-variable models, where powerful tools such as variational inference (VI) exist. However, in contrast to standard latent-variable models, parameter estimation with incomplete data often requires estimating exponentially-many conditional distributions of the missing variables, hence making standard VI methods intractable. We address this gap by introducing variational Gibbs inference (VGI), a new general-purpose method to estimate the parameters of statistical models from incomplete data. We validate VGI on a set of synthetic and real-world estimation tasks, estimating important machine learning models such as variational autoencoders and normalising flows from incomplete data. The proposed method, whilst general-purpose, achieves competitive or better performance than existing model-specific estimation methods.

Show more
View full details
Poster

Network Regression with Graph Laplacians

Yidong Zhou · Hans-Georg Müller
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Network data are increasingly available in various research fields, motivating statistical analysis for populations of networks, where a network as a whole is viewed as a data point. The study of how a network changes as a function of covariates is often of paramount interest. However, due to the non-Euclidean nature of networks, basic statistical tools available for scalar and vector data are no longer applicable. This motivates an extension of the notion of regression to the case where responses are network data. Here we propose to adopt conditional Fréchet means implemented as M-estimators that depend on weights derived from both global and local least squares regression, extending the Fréchet regression framework to networks that are quantified by their graph Laplacians. The challenge is to characterize the space of graph Laplacians to justify the application of Fréchet regression. This characterization then leads to asymptotic rates of convergence for the corresponding M-estimators by applying empirical process methods. We demonstrate the usefulness and good practical performance of the proposed framework with simulations and with network data arising from resting-state fMRI in neuroimaging, as well as New York taxi records.

Show more
View full details
Poster

MMD Aggregated Two-Sample Test

Antonin Schrab · Ilmun Kim · Mélisande Albert · Béatrice Laurent · Benjamin Guedj · Arthur Gretton
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

We propose two novel nonparametric two-sample kernel tests based on the Maximum Mean Discrepancy (MMD). First, for a fixed kernel, we construct an MMD test using either permutations or a wild bootstrap, two popular numerical procedures to determine the test threshold. We prove that this test controls the probability of type I error non-asymptotically. Hence, it can be used reliably even in settings with small sample sizes as it remains well-calibrated, which differs from previous MMD tests which only guarantee correct test level asymptotically. When the difference in densities lies in a Sobolev ball, we prove minimax optimality of our MMD test with a specific kernel depending on the smoothness parameter of the Sobolev ball. In practice, this parameter is unknown and, hence, the optimal MMD test with this particular kernel cannot be used. To overcome this issue, we construct an aggregated test, called MMDAgg, which is adaptive to the smoothness parameter. The test power is maximised over the collection of kernels used, without requiring held-out data for kernel selection (which results in a loss of test power), or arbitrary kernel choices such as the median heuristic. We prove that MMDAgg still controls the level non-asymptotically, and achieves the minimax rate over Sobolev balls, up to an iterated logarithmic term. Our guarantees are not restricted to a specific type of kernel, but hold for any product of one-dimensional translation invariant characteristic kernels. We provide a user-friendly parameter-free implementation of MMDAgg using an adaptive collection of bandwidths. We demonstrate that MMDAgg significantly outperforms alternative state-of-the-art MMD-based two-sample tests on synthetic data satisfying the Sobolev smoothness assumption, and that, on real-world image data, MMDAgg closely matches the power of tests leveraging the use of models such as neural networks.

Show more
View full details
Poster

Fundamental Limits and Tradeoffs in Invariant Representation Learning

Han Zhao · Chen Dan · Bryon Aragam · Tommi Jaakkola · Geoffrey Gordon · Pradeep Ravikumar
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

A wide range of machine learning applications such as privacy-preserving learning, algorithmic fairness, and domain adaptation/generalization among others, involve learning invariant representations of the data that aim to achieve two competing goals: (a) maximize information or accuracy with respect to a target response, and (b) maximize invariance or independence with respect to a set of protected features (e.g.\ for fairness, privacy, etc). Despite their wide applicability, theoretical understanding of the optimal tradeoffs --- with respect to accuracy, and invariance --- achievable by invariant representations is still severely lacking. In this paper, we provide an information theoretic analysis of such tradeoffs under both classification and regression settings. More precisely, we provide a geometric characterization of the accuracy and invariance achievable by any representation of the data; we term this feasible region the information plane. We provide an inner bound for this feasible region for the classification case, and an exact characterization for the regression case, which allows us to either bound or exactly characterize the Pareto optimal frontier between accuracy and invariance. Although our contributions are mainly theoretical, a key practical application of our results is in certifying the potential sub-optimality of any given representation learning algorithm for either classification or regression tasks. Our results shed new light on the fundamental interplay between accuracy and invariance, and may be useful in guiding the design of future representation learning algorithms.

Show more
View full details
Poster

Fast Online Changepoint Detection via Functional Pruning CUSUM Statistics

Gaetano Romano · Idris A. Eckley · Paul Fearnhead · Guillem Rigaill
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Many modern applications of online changepoint detection require the ability to process high-frequency observations, sometimes with limited available computational resources. Online algorithms for detecting a change in mean often involve using a moving window, or specifying the expected size of change. Such choices affect which changes the algorithms have most power to detect. We introduce an algorithm, Functional Online CuSUM (FOCuS), which is equivalent to running these earlier methods simultaneously for all sizes of windows, or all possible values for the size of change. Our theoretical results give tight bounds on the expected computational cost per iteration of FOCuS, with this being logarithmic in the number of observations. We show how FOCuS can be applied to a number of different changes in mean scenarios, and demonstrate its practical utility through its state-of-the-art performance at detecting anomalous behaviour in computer server data.

Show more
View full details
Poster

An Empirical Investigation of the Role of Pre-training in Lifelong Learning

Sanket Vaibhav Mehta · Darshan Patil · Sarath Chandar · Emma Strubell
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel data set of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach outperforms several state-of-the-art task-sequential continual learning algorithms across multiple settings, occasionally even without retaining a memory that scales in size with the number of tasks.

Show more
View full details
Poster

Large sample spectral analysis of graph-based multi-manifold clustering

Nicolas Garcia Trillos · Pengfei He · Chenghui Li
Dec 12, 8:45 AM - 10:45 AM Great Hall & Hall B1+B2 (level 1)
In this work we study statistical properties of graph-based algorithms for multi-manifold clustering (MMC). In MMC the goal is to retrieve the multi-manifold structure underlying a given Euclidean data set when this one is assumed to be obtained by sampling a distribution on a union of manifolds $\M = \M_1 \cup\dots \cup \M_N$ that may intersect with each other and that may have different dimensions. We investigate sufficient conditions that similarity graphs on data sets must satisfy in order for their corresponding graph Laplacians to capture the right geometric information to solve the MMC problem. Precisely, we provide high probability error bounds for the spectral approximation of a tensorized Laplacian on $\M$ with a suitable graph Laplacian built from the observations; the recovered tensorized Laplacian contains all geometric information of all the individual underlying manifolds. We provide an example of a family of similarity graphs, which we call annular proximity graphs with angle constraints, satisfying these sufficient conditions. We contrast our family of graphs with other constructions in the literature based on the alignment of tangent planes. Extensive numerical experiments expand the insights that our theory provides on the MMC problem.
Show more
View full details
Poster

Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-Start

Riccardo Grazzi · Massimiliano Pontil · Saverio Salzo
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)
We analyse a general class of bilevel problems, in which the upper-level problem consists in the minimization of a smooth objective function and the lower-level problem is to find the fixed point of a smooth contraction map. This type of problems include instances of meta-learning, equilibrium models, hyperparameter optimization and data poisoning adversarial attacks. Several recent works have proposed algorithms which warm-start the lower-level problem, i.e. they use the previous lower-level approximate solution as a staring point for the lower-level solver. This warm-start procedure allows one to improve the sample complexity in both the stochastic and deterministic settings, achieving in some cases the order-wise optimal sample complexity. However, there are situations, e.g., meta learning and equilibrium models, in which the warm-start procedure is not well-suited or ineffective. In this work we show that without warm-start, it is still possible to achieve order-wise (near) optimal sample complexity. In particular, we propose a simple method which uses (stochastic) fixed point iterations at the lower-level and projected inexact gradient descent at the upper-level, that reaches an $\epsilon$-stationary point using $O(\epsilon^{-2})$ and $\tilde{O}(\epsilon^{-1})$ samples for the stochastic and the deterministic setting, respectively. Finally, compared to methods using warm-start, our approach yields a simpler analysis that does not need to study the coupled interactions between the upper-level and lower-level iterates.
Show more
View full details
Poster

Intrinsic Gaussian Process on Unknown Manifolds with Probabilistic Metrics

mu niu · Zhenwen Dai · Pokman Cheung · Yizhu Wang
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

This article presents a novel approach to construct Intrinsic Gaussian Processes for regression on unknown manifolds with probabilistic metrics (GPUM) in point clouds. In many real world applications, one often encounters high dimensional data (e.g.‘point cloud data’) centered around some lower dimensional unknown manifolds. The geometry of manifold is in general different from the usual Euclidean geometry. Naively applying traditional smoothing methods such as Euclidean Gaussian Processes (GPs) to manifold-valued data and so ignoring the geometry of the space can potentially lead to highly misleading predictions and inferences. A manifold embedded in a high dimensional Euclidean space can be well described by a probabilistic mapping function and the corresponding latent space. We investigate the geometrical structure of the unknown manifolds using the Bayesian Gaussian Processes latent variable models(B-GPLVM) and Riemannian geometry. The distribution of the metric tensor is learned using B-GPLVM. The boundary of the resulting manifold is defined based on the uncertainty quantification of the mapping. We use the probabilistic metric tensor to simulate Brownian Motion paths on the unknown manifold. The heat kernel is estimated as the transition density of Brownian Motion and used as the covariance functions of GPUM. The applications of GPUM are illustrated in the simulation studies on the Swiss roll, high dimensional real datasets of WiFi signals and image data examples. Its performance is compared with the Graph Laplacian GP, Graph Mat\'{e}rn GP and Euclidean GP.

Show more
View full details
Poster

Sparse Graph Learning from Spatiotemporal Time Series

Andrea Cini · Daniele Zambon · Cesare Alippi
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Outstanding achievements of graph neural networks for spatiotemporal time series analysis show that relational constraints introduce an effective inductive bias into neural forecasting architectures. Often, however, the relational information characterizing the underlying data-generating process is unavailable and the practitioner is left with the problem of inferring from data which relational graph to use in the subsequent processing stages. We propose novel, principled - yet practical - probabilistic score-based methods that learn the relational dependencies as distributions over graphs while maximizing end-to-end the performance at task. The proposed graph learning framework is based on consolidated variance reduction techniques for Monte Carlo score-based gradient estimation, is theoretically grounded, and, as we show, effective in practice. In this paper, we focus on the time series forecasting problem and show that, by tailoring the gradient estimators to the graph learning problem, we are able to achieve state-of-the-art performance while controlling the sparsity of the learned graph and the computational scalability. We empirically assess the effectiveness of the proposed method on synthetic and real-world benchmarks, showing that the proposed solution can be used as a stand-alone graph identification procedure as well as a graph learning component of an end-to-end forecasting architecture.

Show more
View full details
Poster

[Re] FOCUS: Flexible Optimizable Counterfactual Explanations for Tree Ensembles

Kyosuke Morita
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Scope of ReproducibilityThis study aims to reproduce the results of the paper 'FOCUS: Flexible Optimizable Counterfactual Explanations for Tree Ensembles' by Lucic et al.The main claims of the original paper are that FOCUS is able to (i) generate counterfactual explanations for all the instances in a dataset; and (ii) find counterfactual explanations that are closer to the original input for tree-based algorithms than existing methods.MethodologyThis study replicates the original experiments using the code, data, and models provided by the authors. Additionally, this study re-implements code and retrains the models to evaluate the robustness and generality of FOCUS.All the experiments were conducted on a personal laptop with a quad-core CPU with 8GB of RAM and it approximately took 33 hours in total.ResultsThis study was able to replicate the results of the original paper in terms of finding counterfactual explanations for all instances in datasets. Additional experiments were conducted to validate the robustness and generality of the conclusion. While there were slight deviations in terms of generating smaller mean distances, half of the models still outperformed the results of the existing method.What was easyThe implementation of the original paper is publicly available on GitHub. The repository contains the models and data used in the original experiments. Also, the authors provided a technical appendix, which includes all the hyperparameters that were used for the experiments for reproduction upon request.What was difficultAlthough the implementation code was available, it employs outdated packages and the code structure is complex. Also, the comments in the functions and the documentation of the code are sparse or nonexistent, which made it difficult to follow the code.Communication with original authorsI reached out to the authors to obtain the hyperparameters used in the experiments. The authors responded promptly with a detailed technical appendix of the original paper.

Show more
View full details
Poster

Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond

Anna Hedström · Leander Weber · Daniel Krakowczyk · Dilyara Bareeva · Franz Motzkus · Wojciech Samek · Sebastian Lapuschkin · Marina Höhne
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

The evaluation of explanation methods is a research topic that has not yet been explored deeply, however, since explainability is supposed to strengthen trust in artificial intelligence, it is necessary to systematically review and compare explanation methods in order to confirm their correctness. Until now, no tool with focus on XAI evaluation exists that exhaustively and speedily allows researchers to evaluate the performance of explanations of neural network predictions. To increase transparency and reproducibility in the field, we therefore built Quantus—a comprehensive, evaluation toolkit in Python that includes a growing, well-organised collection of evaluation metrics and tutorials for evaluating explainable methods. The toolkit has been thoroughly tested and is available under an open-source license on PyPi (or on https://github.com/understandable-machine-intelligence-lab/Quantus/).

Show more
View full details
Poster

Conditional Distribution Function Estimation Using Neural Networks for Censored and Uncensored Data

Bingqing Hu · Bin Nan
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)
Most work in neural networks focuses on estimating the conditional mean of a continuous response variable given a set of covariates. In this article, we consider estimating the conditional distribution function using neural networks for both censored and uncensored data. The algorithm is built upon the data structure particularly constructed for the Cox regression with time-dependent covariates. Without imposing any model assumptions, we consider a loss function that is based on the full likelihood where the conditional hazard function is the only unknown nonparametric parameter, for which unconstrained optimization methods can be applied. Through simulation studies, we show that the proposed method possesses desirable performance, whereas the partial likelihood method and the traditional neural networks with $L_2$ loss yields biased estimates when model assumptions are violated. We further illustrate the proposed method with several real-world data sets.
Show more
View full details
Poster

The Separation Capacity of Random Neural Networks

Sjoerd Dirksen · Martin Genzel · Laurent Jacques · Alexander Stollenwerk
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)
Neural networks with random weights appear in a variety of machine learning applications, most prominently as the initialization of many deep learning algorithms and as a computationally cheap alternative to fully learned neural networks. In the present article, we enhance the theoretical understanding of random neural networks by addressing the following data separation problem: under what conditions can a random neural network make two classes $\mathcal{X}^-, \mathcal{X}^+ \subset \mathbb{R}^d$ (with positive distance) linearly separable? We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. Crucially, the number of required neurons is explicitly linked to geometric properties of the underlying sets $\mathcal{X}^-, \mathcal{X}^+$ and their mutual arrangement. This instance-specific viewpoint allows us to overcome the usual curse of dimensionality (exponential width of the layers) in non-pathological situations where the data carries low-complexity structure. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity (based on a localized version of Gaussian mean width), which leads to sound and informative separation guarantees. We connect our result with related lines of work on approximation, memorization, and generalization.
Show more
View full details
Poster

Concentration analysis of multivariate elliptic diffusions

Lukas Trottner · Cathrine Aeckerle-Willems · Claudia Strauch
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)
We prove concentration inequalities and associated PAC bounds for both continuous- and discrete-time additive functionals for possibly unbounded functions of multivariate, nonreversible diffusion processes. Our analysis relies on an approach via the Poisson equation allowing us to consider a very broad class of subexponentially ergodic, multivariate diffusion processes. These results add to existing concentration inequalities for additive functionals of diffusion processes which have so far been only available for either bounded functions or for unbounded functions of processes from a significantly smaller class. We demonstrate the power of these exponential inequalities by two examples of very different areas. Considering a possibly high-dimensional, parametric, nonlinear drift model under sparsity constraints we apply the continuous-time concentration results to validate the restricted eigenvalue condition for Lasso estimation, which is fundamental for the derivation of oracle inequalities. The results for discrete additive functionals are applied for an investigation of the unadjusted Langevin MCMC algorithm for sampling of moderately heavy tailed densities $\pi$. In particular, we provide PAC bounds for the sample Monte Carlo estimator of integrals $\pi(f)$ for polynomially growing functions $f$ that quantify sufficient sample and step sizes for approximation within a prescribed margin with high probability.
Show more
View full details
Poster

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

Shangtong Zhang · Remi Tachet des Combes · Romain Laroche
Dec 12, 8:45 AM - 10:45 AM Great Hall & Hall B1+B2 (level 1)

In this paper, we establish the global optimality and convergence rate of an off-policy actor critic algorithm in the tabular setting without using density ratio to correct the discrepancy between the state distribution of the behavior policy and that of the target policy. Our work goes beyond existing works on the optimality of policy gradient methods in that existing works use the exact policy gradient for updating the policy parameters while we use an approximate and stochastic update step. Our update step is not a gradient update because we do not use a density ratio to correct the state distribution, which aligns well with what practitioners do. Our update is approximate because we use a learned critic instead of the true value function. Our update is stochastic because at each step the update is done for only the current state action pair. Moreover, we remove several restrictive assumptions from existing works in our analysis. Central to our work is the finite sample analysis of a generic stochastic approximation algorithm with time-inhomogeneous update operators on time-inhomogeneous Markov chains, based on its uniform contraction properties.

Show more
View full details
Poster

Alpha-divergence Variational Inference Meets Importance Weighted Auto-Encoders: Methodology and Asymptotics

Kamélia Daudel · Joe Benton · Yuyang Shi · Arnaud Doucet
Dec 12, 8:45 AM - 10:45 AM Great Hall & Hall B1+B2 (level 1)

Several algorithms involving the Variational Rényi (VR) bound have been proposed to minimize an alpha-divergence between a target posterior distribution and a variational distribution. Despite promising empirical results, those algorithms resort to biased stochastic gradient descent procedures and thus lack theoretical guarantees. In this paper, we formalize and study the VR-IWAE bound, a generalization of the importance weighted auto-encoder (IWAE) bound. We show that the VR-IWAE bound enjoys several desirable properties and notably leads to the same stochastic gradient descent procedure as the VR bound in the reparameterized case, but this time by relying on unbiased gradient estimators. We then provide two complementary theoretical analyses of the VR-IWAE bound and thus of the standard IWAE bound. Those analyses shed light on the benefits or lack thereof of these bounds. Lastly, we illustrate our theoretical claims over toy and real-data examples.

Show more
View full details
Poster

Reproducibility study of 'Proto2Proto: Can you recognise the car, the way I do?'

Gerson de Kleuver · David Bikker · Wenhua Hu · Bram Veenman
Dec 12, 8:45 AM - 10:45 AM Great Hall & Hall B1+B2 (level 1)

Scope of Reproducibility — This paper analyses the reproducibility of the study Proto2Proto: Can you recognize the car, the way I do? The main contributions and claims of the study are: 1) Using Proto2Proto, a shallower student model is more faithful to the teacher in terms of interpretability than a baseline student model while also showing the same or better accuracy; 2) Global Explanation loss forces student prototypes to be close to teacher prototypes; 3) Patch‐Prototype Correspondence loss enforces the local representations of the student to be similar to those of the teacher; 4) The proposed evaluation metrics determine the faithfulness of the student to the teacher in terms of interpretability.Methodology — A public code repository was available for the paper, which provided a working but incomplete and minimally documented codebase. With some modifications we were able to carry out the experiments that were best supported by the codebase. We spent a total of 60 computational GPU hours on reproduction.Results — The results we were able to produce support claim 1, albeit weakly. Further results are in line with the paper, but we found them to go against claim 3. In addition, we carried out a theoretical analysis which provides support for claim 4. Finally, we were unable to carry out our intended experiment to verify claim 2. What was easy — The original paper was clearly structured and understandable. The experiments for which configurations were provided were simple to conduct.What was difficult — The public codebase contained minimal documentation. Moreover, the use of variable names did not correspond between the code and the paper. Furthermore, the codebase lacked elements vital to reproducing some experiments. Another significant constraint were the computational requirements needed to reproduce the original experiments. Finally, the code required to reproduce one of the visualizations was not provided.Communication with original authors — We contacted the authors to ask for trained model weights and missing hyperparameters for several experiments. We did not receive aresponse.

Show more
View full details
Poster

Reproducibility Study of ”CartoonX: Cartoon Explanations of Image Classifiers”

Aditya Patra · Sina Taslimi · Luke Chin A Foeng · Pratik Kayal
Dec 12, 8:45 AM - 10:45 AM Great Hall & Hall B1+B2 (level 1)

In this reproducibility study, we verify the claims and contributions in Cartoon Explanations of Image Classifiers by Kolek et al.. These include (i) A proposed technique named CartoonX used to extract visual explanations for predictions via image classification networks, (ii) CartoonX being able to reveal piece‐wise smooth regions of the image, unlike previous methods, which extract relevant pixel‐sparse regions, and (iii) CartoonX achieving lower distortion values than these methods.

Show more
View full details
Poster

[Re] VAE Approximation Error: ELBO and Exponential Families

Vol Kyrylov · Navdeep Singh Bedi · Qianbo Zang
Dec 12, 8:45 AM - 10:45 AM Great Hall & Hall B1+B2 (level 1)

Scope of Reproducibility — Exponential family variational autoencoders struggle with reconstruction when encoders output limited information. We reproduce two experiments in which we first train the decoder and encoder separately. Then, we train both modules jointly using ELBO and observe the degradation of reconstruction. We verify how the theoretical insight into the design of the approximate posterior and decoder distributions for a discrete VAE in a semantic hashing application influences the choice of input features to improve overall performance.Methodology — We implement and train a synthetic experiment from scratch on a laptop. We use a mix of authors’ code and publicly available code to reproduce a GAN reinterpreted as a VAE. We consult authors’ code to reproduce semantic hashing experiment and make our own implementation. We train models the USI HPC cluster on machines with GTX 1080 Ti or A100 GPUs and 128 GiB of RAM. We spend under 100 GPU hours for all experiments.Results — We observe expected qualitative behavior predicted by the theory on all experiments. We improve the best semantic hashing model’s test performance by 5 nats by using a modern method for gradient estimation of discrete random variables.What was easy — Following experiment recipes was easy once we worked out the theory.What was difficult — The paper enables verification of exponential family distributions VAE designs of arbitrary complexity, which require probabilistic modeling skills. We contribute elaboration on the implementation details of the synthetic experiment and provide code.Communication with original authors — We are extremely grateful to the authors for discussing the paper, encouraging us to implement experiments on our own, and suggesting directions for improving results over e‐mail.

Show more
View full details
Poster

[Re] Hierarchical Shrinkage: Improving the Accuracy and Interpretability of Tree-Based Methods

Domen Mohorčič · David Ocepek
Dec 12, 8:45 AM - 10:45 AM Great Hall & Hall B1+B2 (level 1)

Scope of Reproducibility: The paper presents a novel post-hoc regularization technique for tree-based models, called Hierarchical Shrinkage (Agarwal 2022). Our main goal is to confirm the claims that it substantially increases the predictive performance of both decision trees and random forests, that it is faster than other regularization techniques, and that it makes the interpretation of random forests simpler.

Methodology: In our reproduction, we used the Hierarchical Shrinkage, provided by the authors in the Python package imodels. We also used their function for obtaining pre-cleaned data sets. While the algorithm code and clean datasets were provided we re-implemented the experiments as well as added additional experiments to further test the validity of the claims. The results were tested by applying Hierarchical Shrinkage to different tree models and comparing them to the authors' results.

Results: We managed to reproduce most of the results the authors get. The method works well and most of the claims are supported. The method does increase the predictive performance of tree-based models most of the time, but not always. When compared to other regularization techniques the Hierarchical Shrinkage outperforms them when used on decision trees, but not when used on random forests. Since the method is applied after learning, it is extremely fast. And it does simplify decision boundaries for random forests, making them easier to interpret.

What was easy: The use of the official code for Hierarchical Shrinkage was straightforward and used the same function naming convention as other machine learning libraries. The function for acquiring already clean data sets saved a lot of time.

What was difficult: The authors also provided the code for their experiments in a separate library, but the code did not run out of the box and we had no success reproducing the results with it. The code was inconsistent with the paper methodology. We had the most problems with hyperparameter tuning. The authors did not specify how they tuned the hyperparameters for the used RF regularizers.

Communication with original authors: We did not contact the authors of the original paper

Show more
View full details
Poster

[Re] $\mathcal{G}$-Mixup: Graph Data Augmentation for Graph Classification

Ermin Omeragic · Vuk Đuranović
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)
Scope of ReproducibilityThis paper presents a novel augmentation method that can be used for graph classification tasks: $\mathcal{G}$-Mixup. Our goal is to reproduce eight claims that the authors make in their paper. The first two claims relate to the properties of graphons estimated from graphs, which are the main components of the method. Claims three to eight relate to the superior performance of the method compared to other augmentation strategies.MethodologyTo reproduce the results, we use the open-source implementation of the method provided by the authors, with a few modifications. We write from scratch all the experiments and pipelines needed to defend the claims of the paper. Additionally, we implement three out of four baseline augmentation methods that are compared to the novel method. For one part of the experiments, we use a local computer and run the experiments on a CPU, with a total of 31.7 CPU hours, while for other more demanding experiments, we use a GPU-accelerated machine for a total of 157.3 GPU hours.ResultsDue to many missing implementation details, we were not able to reproduce all of the original results. Some claims can be supported by our results, but most results are very vague. Even though the new method outperforms the baselines in certain scenarios, we find that the superiority of the method is not as strong as presented in the original paper.What was easyThe novel augmentation method and its theoretical justification are presented intuitively in the paper, and it was easy to grasp the main ideas of the paper.What was difficultWhile the code with the method implementation was given, the reproduction of all the results required much more code and details than what was provided in the paper. This means that we had to make a lot of educated guesses about the experimental settings, choices of hyperparameters, and details about the models used. Communication with original authorsWe contacted the authors on two occasions. The first time was very early in the reproduction process, when we asked about the details of the graph estimation methods that they used and which weren't explained in the paper, to which they responded swiftly. On the second occasion, we asked about the experimental settings and hyperparameter details, but we did not receive a reply.
Show more
View full details
Poster

[Re] Bandit Theory and Thompson Sampling-guided Directed Evolution for Sequence Optimization

Luka Žontar
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)
The paper presents a novel DE approach using Thompson Sampling and Bandit theory, TS-DE. Our reproducibility study aims to confirm the 5 main claims of the original paper, including sublinear Bayesian regret, improved performance compared to basic DE, robustness to mutation rate changes, initial diversification, and the concentration of the population to the optimal value in later iterations, and iterative distribution shift towards optimal population fitness. Finally, we provide a reproducible environment to support the main claims of the original paper along with the source code of the proposed approach and all experiments, comprehensive documentation, and unit tests.No code was available beforehand for this article, thus we re-implemented the proposed approach by meticulously following the comprehensive explanations of the process in the original article. The experiments were run on a personal computer.We managed to reproduce all the experiments supporting the main claims of the original article. Additionally, we add uncertainty quantification to the results as we believe this is a crucial part to confirm any of the claims. Finally, we present the exploration-exploitation trade-off experiment in a more robust manner leveraging the nucleotide diversity metric to gain additional insight into how the proposed algorithm works.With comprehensive explanations in the original article, it was relatively easy to rewrite the main concepts from pseudo-code to Python code. Experiments were clearly explained with all the necessary hyperparameters.Since no source code was available, every detail missing from the original article resulted in additional research and trial and error experimentation. To conclude, we recommend several improvements to the authors of the studied paper that could additionally improve the quality of their outstanding contribution. Some of the recommendations include better explanations of how the optimal solution is calculated, on which population the PCA is fitted, how to use $\theta^*$ and $\Tilde{\theta}$, and lastly improved documentation on how the basic DE was implemented. We contacted the original authors on multiple occasions during the development of our reproducibility study but got no response.
Show more
View full details
Poster

[Re] End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

Sean McLeish · Long Tran-Thanh
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Scope of Reproducibility:In this report, we aim to validate the claims of Bansal et al. These are that the recurrent architecture presented, with skip connections and a progressive loss function, prevent the original problem being forgotten or corrupted during processing allowing for the recurrent module to be applied indefinitely and that this architecture avoids the overthinking trap. We use both code released by the authors and newly developed to recreate many results presented in the paper. Additionally, we present analysis of the newly introduced alpha hyperparameter and investigate interesting perturbation behaviour of prefix sums models. Further, we conduct a hyperparameter search and provide an analysis of the Asymptotic Alignment scores of the models presented.Methodology:We use the PyTorch code released by the authors to replicate accuracy experiments. We then, independently, develop our own PyTorchFI to replicate perturbation experiments presented by Bansal et al. Overall, providing a replication of all results shown in the main body of the paper. We then extend these results, providing an analysis of the alpha hyperparameter, analysis of perturbation recovery, Asymptotic Alignment scores and a hyperparameter search. We used both a Nvidia RTX 2080Ti GPU and sets of three NVIDIA Quadro RTX6000 GPUs, taking a total of 982.2 GPU hours to create all results presented in the main body of this report.Results:We verify the authors' claims by replicating the experiments presented by Bansal et al. All of our experiments show identical results to the ones presented in the paper, apart from perturbation testing for which we provide an additional in depth analysis. We also provide an analysis of the new alpha hyperparameter and a hyperparameter search.What was easy:The code provided by Bansal et al gave clear instructions on how to use it, along with pretrained models being available for all problems.What was difficult:Chess models required a considerable amount of time to train, putting a drain on resources. Also, code for reproducing perturbation results was not available so this had to developed from scratch.Communication with original authors:We had good communication with the original authors, both emailing and meeting online.

Show more
View full details
Poster

[Re] Fairness Guarantees under Demographic Shift

Valentin Buchner · Philip Schutte · Yassin Ben Allal · Hamed Ahadi
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Scope of Reproducibility: The original authors' main contribution is the family of Shifty algorithms, which can guarantee that certain fairness constraints will hold with high confidence even after a demographic shift in the deployment population occurs. They claim that Shifty provides these high-confidence fairness guarantees without a loss in model performance, given enough training data.Methodology: The code provided by the original paper was used, and only some small adjustments needed to be made in order to reproduce the experiments. All model specifications and hyperparameters from the original implementation were used. Extending beyond reproducing the original paper, we investigated the sensibility of Shifty to the size of the bounding intervals limiting the possible demographic shift, and ran shifty with an additional optimization method.Results: Our results approached the results reported in the original paper. They supported the claim that \textit{Shifty} reliably guarantees fairness under demographic shift, but could not verify that Shifty performs at no loss of accuracy. What was easy: The theoretical framework laid out in the original paper was well explained and supported by additional formulas and proofs in the appendix. Further, the authors provided clear instructions on how to run the experiments and provided necessary hyperparameters.What was difficult: While an open-source implementation of Shifty was provided and was debugged with relatively low time investment, the code did not contain extensive documentation and was complex to understand. It was therefore difficult to verify that each part of the code functions as expected and to expand upon the existing experiments. Further, certain hyperparameter and model specifications deviated between the provided code and the original paper, which made it challenging to know which specifications to apply when reproducing.Communication with original authors: The first author of the original paper was contacted, but we have yet to receive a reply.

Show more
View full details
Poster

[Re] On the Reproducibility of CartoonX

Robin Sasse · Aniek Eijpe · Jona Ruthardt · Elias Dubbeldam
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Scope of Reproducibility — CartoonX [1] is a novel explanation method for image classifiers. In this reproducibility study, we examine the claims of the original authors of CartoonX that it: (i) extracts relevant piece‐wise smooth parts of the image, resulting in explanations which are more straightforward to interpret for humans; (ii) achieves lower distortion in the model output, using fewer coefficients than other state-of‐the‐art methods; (iii) is model‐agnostic. Finally, we examine how to reduce the runtime.Methodology — The original authors’ open‐sourced implementation has been used to examine (i). We implemented the code to examine (ii), as there was no public code available for this. We tested claim (iii) by performing the same experiments with a Vision Transformer instead of a CNN. To reduce the runtime, we extended the existing implementation with multiple enhanced initialization techniques. All experiments took approximately 38.4 hours on a single NVIDIA Titan RTX.Results — Our results support the claims made by the original authors. (i) We observe that CartoonX produces piece‐wise smooth explanations. Most of the explanations give valuable insights. (ii) Most experiments, that show how CartoonX achieves lower distortion outputs compared to other methods, have been reproduced. In the cases where exact reproducibility has not been achieved, claim (ii) of the author still holds. (iii) The model‐agnosticism claim still holds as the overall quality of the ViT‐based explanations almost matches that of the CNN‐based explanations. Finally, simple heuristical initializations did not improve the runtime.What was easy — The mathematical background and intuition of CartoonX were clearly explained by the original authors. Moreover, the original author’s code was well structured and documented, which made it straightforward to run and extend.What was difficult — Some hyperparameter settings and implementation details needed to reproduce the experiments were not clear or transparent from the original paper or code. This made it difficult to implement and reproduce these experiments.Communication with original authors — We have been in brief communication with the original authors. They were able to address most of our points, providing us with some additional clarifications about the exact implementation and hyperparameter settings.

Show more
View full details
Poster

Reproducibility study of the Fairness-enhanced Node Representation Learning

Gijs Moens · Job De Witte · Tobias Gobel · Meggie Van den Oever
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

"CrossWalk: Fairness-Enhanced Node Representation Learning" is set to be reproduced and reviewed. It presents an extension to existing graph algorithms that incorporate the idea of biased random walks for obtaining node embeddings. CrossWalk incorporates fairness by up-weighting edges of nodes located near group boundaries. The authors claim that their approach outperforms baseline algorithms, such as DeepWalk and FairWalk, in terms of reducing the disparity between different classes within a graph network. The authors accompanied their paper with the publication of an open GitHub page, which includes the source code and relevant data sets. The limited size of the data sets in combination with the efficient algorithms enables the experiments to be conducted without significant difficulties and is computable on standard CPUs without the need for additional resources.In this reproducibility report, the outcomes of the experiments are in agreement with the results presented in the original paper. However, the inherent randomness of the random walks makes it difficult to quantify the extent of similarity between the reproduced results and the results as stated in the original paper. However, it can be concluded that CrossWalk results in a decreased disparity between groups in graph networks.The authors effectively conveyed the underlying concept of their proposed method, rendering it both intriguing and straightforward to comprehend the key ideas. Furthermore, the authors successfully incorporated a range of methods and baseline algorithms into the paper.In contrast, the source code may not have been optimally constructed with reproducibility in mind. Certain sections of the code appear to be unfinished or inadequately executed. Additionally, the authors neglected to specify key hyperparameters, resulting in the unidentifiability of certain results. This presents challenges in drawing conclusions based on the available sources.The authors were unable to respond in time for elaborating on certain implementation details. However, we did receive additional data which was crucial to obtaining certain results.

Show more
View full details
Poster

Easy Bayesian Transfer Learning with Informative Priors

Martin Špendl · Klementina Pirc
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

REPRODUCIBILITY SUMMARYScope of ReproducibilityIn this work, we study the reproducibility of the paper: Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors. The paper proposes a three-step pipeline for replacing standard transfer learning with a pre-trained prior. The first step is training a prior, the second is re-scaling of a prior, and the third is inference.The authors claim that increasing the rank and the scaling factor improves performance on the downstream task. They also argue that using Bayesian learning with informative prior leads to a more data-efficient and improved performance compared to standard SGD transfer learning or using non-informative prior. We reproduce the main claims on one of the four data sets in the paper.MethodologyWe used a combination of the authors' and our code. The authors provided a training pipeline for the user but not the code to fully reproduce the paper. We modified the training pipeline to suit our needs and created a testing pipeline to evaluate the models. We reproduced the results for the Oxford-102-Flowers data set on an Nvidia RTX 3070 GPU using approximately 310 GPU hours for the main results.ResultsOur results confirm most of the claims tested, although we could not achieve the exact same accuracy due to missing hyper-parameters. We reproduced the trend in how scaling the prior impacts the performance and how a learned prior outperforms a non-learned prior.On contrary, we could not reproduce the effect of rank in low-rank covariance approximation on model performance, as well as the beneficial boost in performance of Bayesian learning compared to the standard SGD.What was easyThe authors' implementation provides various training and logging parameters. It is also helpful that the authors provided both the learned priors and scripts for the download, split and pre-processing of the data sets used in the study.What was difficultSetting the environment for the used packages to work correctly was difficult. Although many parameters are available for running the pipeline, their descriptions are misguiding, therefore a lot of time went into clarifying the parameter function and debugging different settings. The training also took a while, especially when training 5 models per data point. Communication with original authorsWe contacted the authors via e-mail about their pipeline and their use of hyper-parameters but did not hear back.

Show more
View full details
Poster

[Re] Variational Neural Cellular Automata

Albert Sund Aillet · Simon Sondén
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)
The main claim of the paper being reproduced is that the proposed Variational Neural Cellular Automata (VNCA) architecture, composed of a convolutional encoder and a Neural Cellular Automata (NCA)-based decoder, is able to generate high-quality samples.The paper presents two variants of this VNCA decoder: the doubling VNCA variant that is claimed to have a simple latent space, and the non-doubling VNCA variant that is claimed to be optimized for damage recovery and stability over many steps.To reproduce the results, we re-implemented all of the VNCA models and a fully-convolutional baseline in JAX, by using the descriptions given in the paper. We then followed the same experimental setup and hyperparameter choices as in the original paper. All of the models were trained on a TPU v3-8 provided by Kaggle, with a total budget of around 4 TPU hours, not counting unreported experiments.All but one of the figures and results from the original study were possible to reproduce. The obtained Evidence Lower Bound (ELBO) of the doubling VNCA was within $0.3\%$ of the stated and for the non-doubling VNCA the ELBO was within $1.8\%$ and the observed damage recovery was similar. We were however not able to reproduce the t-SNE reduction experiment for the baseline and were therefore not able to show the VNCA decoder having a cleaner t-SNE separation than the baseline.The implementation of the convolutional baseline and most parts of the NCA-based decoder were straightforward to re-implement based on the description provided in the paper.One of the difficulties faced in the reproduction study was obtaining the labels of the binarized MNIST dataset used in the original paper since it officially is provided without labels. This made it unclear how the original paper got the labels for the latent space visualization. Additionally, implementing the pool for the non-doubling version of the VNCA model with a distributed training setup was challenging as it required operations between TPU cores.No communication was made with the original authors.
Show more
View full details
Poster

[Re] Exploring the Role of Grammar and Word Choice in Bias Toward African American English (AAE) in Hate Speech Classification

Priyanka Bose · Chandra Shekhar Pandey · Fraida Fund
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Reproducibility SummaryScope of Reproducibility — We aim to reproduce a result from the paper “Exploring the Role of Grammar and Word Choice in Bias Toward African American English (AAE) in Hate Speech Classification” [1]. Our study is restricted specifically to the claim that the use of swear words impacts hate speech classification of AAE text. We were able to broadly validate the claim of the paper, however, the magnitude of the effect was dependent on the word replacement strategy, which was somewhat ambiguous in the original paper.Methodology — The authors did not publish source code. Therefore, we reproduce the experiments by following the methodology described in the paper. We train BERT models from TensorFlow Hub [2] to classify hate speech using the DWMW17[3] and FDCL18[4]Twitter datasets. Then, we compile a dictionary of swear words and replacement words with comparable meaning, and we use this to create “censored” versions of samples in Blodgett et al.’s[5] AAE Twitter dataset. Using the BERT models, we evaluate the hate speech classification of the original data and the censored data. Our experiments are conducted on an open‐access research testbed, Chameleon [6], and we make available both our code and instructions for reproducing the result on the shared facility.Results — Our results are consistent with the claim that the censored text (without swear words) is less often classified as hate speech, offensive, or abusive than the same text with swear words. However, we find the classification is very sensitive to the word replacement dictionary being used.What was easy — The authors used three well known datasets which were easy to obtain. They also used well‐known widely available models, BERT and Word2Vec.What was difficult — Some of the details of model training were not fully specified. Also, we were not able to exactly re‐create a comparable dictionary of swear words and their replacement terms of similar meanings, following their methodology.Communication with original authors — We reached out to the authors, but were not able to communicate with them before the submission of this report.

Show more
View full details
Poster

Reproducibility Study of ”Label-Free Explainability for Unsupervised Models”

Julius Wagenbach · Gergely Papp · Niklas Mather · Laurens de Vries
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

In this work, we present our reproducibility study of "Label-Free Explainability for Unsupervised Models", a paper that introduces two post‐hoc explanation techniques for neural networks: (1) label‐free feature importance and (2) label‐free example importance. Our study focuses on the reproducibility of the authors’ most important claims: (i) perturbing features with the highest importance scores causes higherlatent shift than perturbing random pixels, (ii) label‐free example importance scores help to identify training examples that are highly related to a given test example, (iii) unsupervised models trained on different tasks show moderate correlation among the highest scored features and (iv) low correlation in example scores measured on a fixed set of data points, and (v) increasing the disentanglement with β in a β‐VAE does not imply that latent units will focus on more different features. We reviewed the authors’ code, checked if the implementation of experiments matched with the paper, and also ran all experiments. The results are shown to be reproducible. Moreover, we extended the codebase in order to run the experiments on more datasets, and to test the claims with other experiments.

Show more
View full details
Poster

[Re] CrossWalk: Fairness-enhanced Node Representation Learning

Luca Pantea · Andrei-Eusebiu Blahovici
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Scope of ReproducibilityThis work aims to reproduce the findings of the paper “CrossWalk: Fairness-enhanced Node Representation Learning” by investigating the two main claims made by the authors about CrossWalk, which suggest that (i) CrossWalk enhances fairness in three graph algorithms, while only suffering from small decreases in performance, and that (ii) CrossWalk preserves the necessary structural properties of the graph while reducing disparity.MethodologyThe authors made the CrossWalk repository available, which contained most of the datasets used for their experimentation, and the scripts needed to run the experiments. However, the codebase lacked documentation and was missing logic for running all experiments and visualizing the results. We, therefore, re-implement their code from scratch and deploy it as a python package which can be run to obtain all the showcased results. ResultsOur work suggests that the first claim of the paper, which states that Crosswalk minimizes disparity and thus enhances fairness is partially reproducible, and only for the tasks of Node classification and Influence maximization as the parameters specified in the paper do not always yield similar results. Then, the second claim of the paper which states that Crosswalk attains the necessary structural properties of the graph is fully reproducible through our experiments.What was easyThe original paper contained the necessary information about hyperparameters, which coupled with the publicly available repository made it straightforward to refactor the code and understand the idea of the proposed method. What was difficultThe difficulty stems from the lack of structure and documentation in the provided code which made the original experiments hard to reproduce. Furthermore, there were missing files in the provided datasets. Also, some experiments were not reproducible at all through the provided code. One more important aspect is that the experiments are CPU intensive which made the reproducibility even harder.Communication with original authorsAlbeit rather late, the authors provided meaningful feedback on our questions about implementation details and initial results.

Show more
View full details
Poster

Reproducibility Study of “Quantifying Societal Bias Amplification in Image Captioning”

Farrukh Baratov · Goksenin Yuksel · Darie Petcu · Jan Bakker
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Scope of reproducibility - We study the reproducibility of the paper "Quantifying Societal Bias Amplification in Image Captioning" by Hirota et al. In this paper, the authors propose a new metric to measure bias amplification, called LIC, and evaluate it on multiple image captioning models. Based on this evaluation, they make the following main claims which we aim to verify: (1) all models amplify gender bias, (2) all models amplify racial bias, (3) LIC is robust against encoders, and (4) the NIC+Equalizer model increases gender bias with respect to the baseline. We also extend upon the original work by evaluating LIC for age bias.Methodology - For our reproduction, we were able to run the code provided by the authors without any modifications. For our extension, we automatically labelled the images in the dataset with age annotations and adjusted the code to work with this dataset. In total, 38 GPU hours were needed to perform all experiments.Results - The reproduced results are close to the original results and support all four main claims. Furthermore, our additional results show that only a subset of the models amplifies age bias, while they strengthen the claim that LIC is robust against encoders. However, we acknowledge that our extension to age bias has its limitations.What was easy - The author's code and the data needed to run it are publicly available. The code required no modification to run and the scripts were provided with an extensive argument parser, allowing us to quickly set up our experiments. Moreover, the details of the original experiments were clearly stated in the appendix.What was difficult - We found that it was difficult to interpret the author's code as the provided documentation contained room for improvement. Also, the scripts contained repetitive code. While the authors retrained all image captioning models, they did not share the model weights, making it difficult to extend upon their work.Communication with original authors - No (attempt at) communication with the original authors was performed.

Show more
View full details
Poster

[Re] Pure Noise to the Rescue of Insufficient Data

Ryan Lee · Seungmin Lee
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Scope of Reproducibility — We examine the main claims of the original paper [1], whichstates that in an image classification task with imbalanced training data, (i) using purenoise to augment minority‐class images encourages generalization by improving minority‐class accuracy. This method is paired with (ii) a new batch normalization layer thatnormalizes noise images using affine parameters learned from natural images, whichimproves the model’s performance. Moreover, (iii) this improvement is robust to vary‐ing levels of data augmentation. Finally, the authors propose that (iv) adding pure noiseimages can improve classification even on balanced training data.Methodology — We implemented the training pipeline from the description of the paperusing PyTorch and integrated authors’ code snippets for sampling pure noise imagesand batch normalizing noise and natural images separately. All of our experiments wererun on a machine from a cloud computing service with one NVIDIA RTX A5000 GraphicsCard and had a total computational time of approximately 432 GPU hours.Results — We reproduced the main claims that (i) oversampling with pure noise improvesgeneralization by improving the minority‐class accuracy, (ii) the proposed batch nor‐malization (BN) method outperforms baselines, (iii) and this improvement is robustacross data augmentations. Our results also support that (iv) adding pure noise imagescan improve classification on balanced training data. However, additional experimentssuggest that the performance improvement from OPeN may be more orthogonal to theimprovement caused by a bigger network or more complex data augmentation.What was easy — The code snippet in the original paper was thoroughly documented andwas easy to use. The authors also clearly documented most of the hyperparameters thatwere used in the main experiments.What was difficult — The repo linked in the original paper was not populated yet. As a re‐sult, we had to retrieve the CIFAR‐10‐LT dataset from previous works [2, 3], re‐implementWideResNet [4], and the overall training pipeline.Communication with original authors — We contacted the authors for clarifications on theimplementation details of the algorithm. Prior works had many important implemen‐tation details such as linear learning rate warmup or deferred oversampling, so we con‐firmed with the authors on whether these methods were used.

Show more
View full details
Poster

[Re] On Explainability of Graph Neural Networks via Subgraph Explorations

Yannik Mahlau · Lukas Berg · Leonie Kayser
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Yuan et al. claim their proposed method SubgraphX achieves (i) higher fidelity in explaining models for graph- and node classification tasks compared to other explanation techniques, namely GNNExplainer. Additionally, (ii) the computational effort of SubgraphX is at a 'reasonable level', which is not further specified by the original authors. We define this as at most ten times slower than GNNExplainer. We reimplemented the proposed algorithm in PyTorch. Then, we replicated the experiments performed by the authors on a smaller scale due to resource constraints. Additionally, we checked the performance on a new dataset and investigated the influence of hyperparameters. Lastly, we improved SubgraphX using greedy initialization and utilizing fidelity as a score function. We were able to reproduce the main claims on the MUTAG dataset, where SubgraphX has a better performance than GNNExplainer. Furthermore, SubgraphX has a reasonable runtime of about seven times longer than GNNExplainer. We successfully employed SubgraphX on the Karate Club dataset, where it outperforms GNNExplainer as well. The hyperparameter study revealed that the number of Monte-Carlo Tree search iterations and Monte-Carlo sampling steps are the most important hyperparameters and directly trade performance for runtime. Lastly, we show that our proposed improvements to SubgraphX significantly enhance fidelity and runtime. The authors' description of the algorithm was clear and concise. The original implementation is available in the DIG-library as a reference to our implementation. The authors performed extensive experiments, which we could not replicate in their full scale due to resource constraints. However, we were able to achieve similar results on a subset of the datasets used. Another issue was that despite the original code of the authors and datasets being publicly available, there were many compatibility issues. The original authors briefly reviewed our work and agreed with the findings.

Show more
View full details
Poster

[Re] Numerical influence of ReLU'(0) on backpropagation

Tommaso Martorella · Hector Manuel Ramirez Contreras · Daniel Garcia
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Neural networks have become very common in machine learning, and new problems and trends arise as the trade-off between theory, computational tools and real-world problems become more narrow and complex. We decided to retake the influence of the ReLU'(0) on the backpropagation as it has become more common to use lower floating point precisions in the GPUs so that more tasks can run in parallel and make training and inference more efficient. As opposed to what theory suggests, the original authors shown that when using 16- and 32-bit precision, the value of ReLU'(0) may influence the result. In this work we extended some experiments to see how the training and test loss are affected in simple and more complex models.

Show more
View full details
Poster

Inference for Gaussian Processes with Matern Covariogram on Compact Riemannian Manifolds

Didong Li · Wenpin Tang · Sudipto Banerjee
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Gaussian processes are widely employed as versatile modelling and predictive tools in spatial statistics, functional data analysis, computer modelling and diverse applications of machine learning. They have been widely studied over Euclidean spaces, where they are specified using covariance functions or covariograms for modelling complex dependencies. There is a growing literature on Gaussian processes over Riemannian manifolds in order to develop richer and more flexible inferential frameworks for non-Euclidean data. While numerical approximations through graph representations have been well studied for the Matern covariogram and heat kernel, the behaviour of asymptotic inference on the parameters of the covariogram has received relatively scant attention. We focus on asymptotic behaviour for Gaussian processes constructed over compact Riemannian manifolds. Building upon a recently introduced Matern covariogram on a compact Riemannian manifold, we employ formal notions and conditions for the equivalence of two Matern Gaussian random measures on compact manifolds to derive the parameter that is identifiable, also known as the microergodic parameter, and formally establish the consistency of the maximum likelihood estimate and the asymptotic optimality of the best linear unbiased predictor. The circle is studied as a specific example of compact Riemannian manifolds with numerical experiments to illustrate and corroborate the theory.

Show more
View full details
Poster

RELIC: Reproducibility and Extension on LIC metric in quantifying bias in captioning models

Martijn van Raaphorst · Egoitz Gonzalez · Marta Grasa · Paula Antequera Hernández
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Scope of ReproducibilityIn this work we reproduce and extend the results presented in “Quantifying Societal Bias Amplification in Image Captioning” by Hirota et al. This paper introduces LIC, a metric to quantify bias amplification by image captioning models, which is tested for gender and racial bias amplification. The original paper claims that this metric is robust, and that all models amplify both gender and racial bias. It also claims that gender bias is more apparent than racial bias, and the Equalizer variation of the NIC+ model increases gender but not racial bias. We repeat the measurements to confirm these claims. We extend the analysis to whether the method can be generalized to other attributes such as bias in age.MethodologyThe authors of the paper provided a repository containing the necessary code. We had to modify it and add several scripts to be able to run all the experiments. The results were reproduced using the same subset of COCO [3] as in the original paper. Additionally, we manually labeled images according to age for our specific experiments. All experiments were ran on GPUs for a total of approximately 100 hours.ResultsAll claims made by the paper seem to hold, as the results we obtained follow the same trends as those presented in the original paper even if they do not match exactly. However, the same cannot always be said of the additional experiments.What was easyThe paper was clear and matched the implementation. The code was well organized and was easy to run using the command interface provided by the authors. This also made it easy to replicate and expand upon it by adding our own new features. The data was also readily available and could be easily downloaded with no need for preprocessing.What was difficultWe had to run several iterations of the same code, using different seeds and models, to get the results with the same conditions as in the original paper, which made use of time and resources. Our own experiments required additional time to hand-annotate data due to lack of data for new features.Communication with original authorsThere was no contact with the authors, since the code and the experiments were clear and did not need any additional explanation.

Show more
View full details
Poster

[Re] On the Reproducibility of “FairCal: Fairness Calibration for Face Verification”

Marga Don · Satchit Chatterji · Milena Kapralova · Ryan Amaudruz
Dec 14, 3:00 PM - 5:00 PM Great Hall & Hall B1+B2 (level 1)

Scope of Reproducibility — This paper aims to reproduce the study FairCal: Fairness Calibration for Face Verification by Salvador et al., focused on verifying three main claims: FairCal (introduced by the authors) achieves state‐of‐the‐art (i) global accuracy, (ii) fairness-calibrated probabilities and (iii) equality in false positive rates across sensitive attributes (i.e. predictive equality). The sensitive attribute taken into account is ethnicity.Methodology — Salvador et al. provide partial code via a GitHub repository. Additional code to generate image embeddings from three pretrained neural network models were based on existing repositories. All code was refactored to fit our needs, keeping extendability and readability in mind. Two datasets were used, namely, Balanced Faces in the Wild (BFW) and Racial Faces in the Wild (RFW). Additional experiments using Gaussian mixture models instead of K‐means clustering for FairCal validate the use of unsupervised clus‐ tering methods. The code was run on an AMD Ryzen 7 2700X CPU and NVIDIA GeForce GTX1080Ti GPU with a total runtime of around 3 hours for all experiments.Results — In most cases, we were able to reproduce results from the original paper to within 1 standard deviation, and observe similar trends. However, due to missing information about image pre‐processing, we were unable to reproduce the results exactly.What was easy — The original paper is clear and understandable. Furthermore, the authors provided a mostly working version of the code. Though the datasets are not freely available to the public, their authors supplied these to us swiftly after contacting them.What was difficult — While most of the code worked with slight changes, it was assumed there were files containing image embeddings available for both datasets, which the authors neither provided nor gave details about. We therefore pre‐processed and generated embeddings independent of the authors, which makes it more difficult to judge the overall reproducibility of their method. Additionally, we encountered difficulties while improving the efficiency and extendability of the code.Communication with original authors — We emailed the first author of the paper twice. First at the beginning of our undertaking, they were enthusiastic about our attempt, and clarified a few initial doubts about their implementation, the embeddings, and missing files. As per the writing of this paper, they have not responded to the second email.

Show more
View full details
Poster

Reproducibility Study of "Label-Free Explainability for Unsupervised Models"

Valentinos Pariza · Avik Pal · Madhura Pawar · Quim Serra Faber

Scope of ReproducibilityIn this work, we evaluate the reproducibility of the paper Label-Free Explainability for Unsupervised Models by Crabbe and van der Schaar. Our goal is to reproduce the paper's four main claims in a label-free setting:(1) feature importance scores determine salient features of a model's input, (2) example importance scores determine salient training examples to explain a test example, (3) interpretability of saliency maps is hard for disentangled VAEs, (4) distinct pretext tasks don’t have interchangeable representations.MethodologyThe authors of the paper provide an implementation in PyTorch for their proposed techniques and experiments. We reuse and extend their code for our additional experiments. Our reproducibility study comes at a total computational cost of 110 GPU hours, using an NVIDIA Titan RTX. ResultsWe reproduced the original paper's work through our experiments. We find that the main claims of the paper largely hold. We assess the robustness and generalizability of some of the claims, through our additional experiments. In that case, we find that one claim is not generalizable and another is not reproducible for the graph dataset.What was easyThe original paper is well-structured. The code implementation is well-organized and with clear instructions on how to get started. This was helpful to understand the paper's work and begin experimenting with their proposed methods.What was difficultWe found it difficult to extrapolate some of the authors' proposed techniques to datasets other than those used by them. Also, we were not able to reproduce the results for one of the experiments. We couldn't find the exact reason for it by running explorative experiments due to time and resource constraints.Communication with original authorsWe reached out to the authors once about our queries regarding one experimental setup and to understand the assumptions and contexts of some sub-claims in the paper. We received a prompt response which satisfied most of our questions.

Show more
View full details