Timezone: »
Spotlight
Comparing distributions: $\ell_1$ geometry improves kernel two-sample testing
Meyer Scetbon · Gael Varoquaux
Are two sets of observations drawn from the same distribution? This
problem is a two-sample test.
Kernel methods lead to many appealing properties. Indeed state-of-the-art
approaches use the $L^2$ distance between kernel-based
distribution representatives to derive their test statistics. Here, we show that
$L^p$ distances (with $p\geq 1$) between these
distribution representatives give metrics on the space of distributions that are
well-behaved to detect differences between distributions as they
metrize the weak convergence. Moreover, for analytic kernels,
we show that the $L^1$ geometry gives improved testing power for
scalable computational procedures. Specifically, we derive a finite
dimensional approximation of the metric given as the $\ell_1$ norm of a vector which captures differences of expectations of analytic functions evaluated at spatial locations or frequencies (i.e, features). The features can be chosen to
maximize the differences of the distributions and give interpretable
indications of how they differs. Using an $\ell_1$ norm gives better detection
because differences between representatives are dense
as we use analytic kernels (non-zero almost everywhere). The tests are consistent, while
much faster than state-of-the-art quadratic-time kernel-based tests. Experiments
on artificial
and real-world problems demonstrate
improved power/time tradeoff than the state of the art, based on
$\ell_2$ norms, and in some cases, better outright power than even the most
expensive quadratic-time tests. This performance gain is retained even in high dimensions.
Author Information
Meyer Scetbon (CREST-ENSAE)
Gael Varoquaux (Parietal Team, INRIA)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Comparing distributions: $\ell_1$ geometry improves kernel two-sample testing »
Fri. Dec 13th 01:00 -- 03:00 AM Room East Exhibition Hall B + C #6
More from the Same Authors
-
2021 Spotlight: What’s a good imputation to predict with missing values? »
Marine Le Morvan · Julie Josse · Erwan Scornet · Gael Varoquaux -
2021 : AI as statistical methods for imperfect theories »
Gael Varoquaux -
2021 : Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs »
Meyer Scetbon · Gabriel Peyré · Marco Cuturi -
2021 : Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs »
Meyer Scetbon · Gabriel Peyré · Marco Cuturi -
2022 Poster: Why do tree-based models still outperform deep learning on typical tabular data? »
Leo Grinsztajn · Edouard Oyallon · Gael Varoquaux -
2022 Poster: Low-rank Optimal Transport: Approximation, Statistics and Debiasing »
Meyer Scetbon · Marco Cuturi -
2021 Poster: What’s a good imputation to predict with missing values? »
Marine Le Morvan · Julie Josse · Erwan Scornet · Gael Varoquaux -
2020 Poster: Linear Time Sinkhorn Divergences using Positive Features »
Meyer Scetbon · Marco Cuturi -
2020 Poster: NeuMiss networks: differentiable programming for supervised learning with missing values. »
Marine Le Morvan · Julie Josse · Thomas Moreau · Erwan Scornet · Gael Varoquaux -
2020 Oral: NeuMiss networks: differentiable programming for supervised learning with missing values. »
Marine Le Morvan · Julie Josse · Thomas Moreau · Erwan Scornet · Gael Varoquaux -
2019 Poster: Manifold-regression to predict from MEG/EEG brain signals without source modeling »
David Sabbagh · Pierre Ablin · Gael Varoquaux · Alexandre Gramfort · Denis A. Engemann -
2017 : Scikit-learn & nilearn: Democratisation of machine learning for brain imaging (INRIA) »
Gael Varoquaux -
2017 : Invited Talk: "Tales from fMRI: Learning from limited labeled data" »
Gael Varoquaux -
2017 Poster: Learning Neural Representations of Human Cognition across Many fMRI Studies »
Arthur Mensch · Julien Mairal · Danilo Bzdok · Bertrand Thirion · Gael Varoquaux -
2016 Poster: Learning brain regions via large-scale online structured sparse dictionary learning »
Elvis DOHMATOB · Arthur Mensch · Gael Varoquaux · Bertrand Thirion -
2015 Poster: Semi-Supervised Factored Logistic Regression for High-Dimensional Neuroimaging Data »
Danilo Bzdok · Michael Eickenberg · Olivier Grisel · Bertrand Thirion · Gael Varoquaux -
2013 Poster: Mapping paradigm ontologies to and from the brain »
Yannick Schwartz · Bertrand Thirion · Gael Varoquaux