Timezone: »
Spotlight
Comparing distributions: $\ell_1$ geometry improves kernel twosample testing
Meyer Scetbon · Gael Varoquaux
Are two sets of observations drawn from the same distribution? This
problem is a twosample test.
Kernel methods lead to many appealing properties. Indeed stateoftheart
approaches use the $L^2$ distance between kernelbased
distribution representatives to derive their test statistics. Here, we show that
$L^p$ distances (with $p\geq 1$) between these
distribution representatives give metrics on the space of distributions that are
wellbehaved to detect differences between distributions as they
metrize the weak convergence. Moreover, for analytic kernels,
we show that the $L^1$ geometry gives improved testing power for
scalable computational procedures. Specifically, we derive a finite
dimensional approximation of the metric given as the $\ell_1$ norm of a vector which captures differences of expectations of analytic functions evaluated at spatial locations or frequencies (i.e, features). The features can be chosen to
maximize the differences of the distributions and give interpretable
indications of how they differs. Using an $\ell_1$ norm gives better detection
because differences between representatives are dense
as we use analytic kernels (nonzero almost everywhere). The tests are consistent, while
much faster than stateoftheart quadratictime kernelbased tests. Experiments
on artificial
and realworld problems demonstrate
improved power/time tradeoff than the state of the art, based on
$\ell_2$ norms, and in some cases, better outright power than even the most
expensive quadratictime tests. This performance gain is retained even in high dimensions.
Author Information
Meyer Scetbon (CRESTENSAE)
Gael Varoquaux (Parietal Team, INRIA)
Related Events (a corresponding poster, oral, or spotlight)

2019 Poster: Comparing distributions: $\ell_1$ geometry improves kernel twosample testing »
Fri. Dec 13th 01:00  03:00 AM Room East Exhibition Hall B + C #6
More from the Same Authors

2021 Spotlight: What’s a good imputation to predict with missing values? »
Marine Le Morvan · Julie Josse · Erwan Scornet · Gael Varoquaux 
2021 : AI as statistical methods for imperfect theories »
Gael Varoquaux 
2021 : LinearTime Gromov Wasserstein Distances using Low Rank Couplings and Costs »
Meyer Scetbon · Gabriel Peyré · Marco Cuturi 
2021 : LinearTime Gromov Wasserstein Distances using Low Rank Couplings and Costs »
Meyer Scetbon · Gabriel Peyré · Marco Cuturi 
2022 Poster: Why do treebased models still outperform deep learning on typical tabular data? »
Leo Grinsztajn · Edouard Oyallon · Gael Varoquaux 
2022 Poster: Lowrank Optimal Transport: Approximation, Statistics and Debiasing »
Meyer Scetbon · Marco Cuturi 
2021 Poster: What’s a good imputation to predict with missing values? »
Marine Le Morvan · Julie Josse · Erwan Scornet · Gael Varoquaux 
2020 Poster: Linear Time Sinkhorn Divergences using Positive Features »
Meyer Scetbon · Marco Cuturi 
2020 Poster: NeuMiss networks: differentiable programming for supervised learning with missing values. »
Marine Le Morvan · Julie Josse · Thomas Moreau · Erwan Scornet · Gael Varoquaux 
2020 Oral: NeuMiss networks: differentiable programming for supervised learning with missing values. »
Marine Le Morvan · Julie Josse · Thomas Moreau · Erwan Scornet · Gael Varoquaux 
2019 Poster: Manifoldregression to predict from MEG/EEG brain signals without source modeling »
David Sabbagh · Pierre Ablin · Gael Varoquaux · Alexandre Gramfort · Denis A. Engemann 
2017 : Scikitlearn & nilearn: Democratisation of machine learning for brain imaging (INRIA) »
Gael Varoquaux 
2017 : Invited Talk: "Tales from fMRI: Learning from limited labeled data" »
Gael Varoquaux 
2017 Poster: Learning Neural Representations of Human Cognition across Many fMRI Studies »
Arthur Mensch · Julien Mairal · Danilo Bzdok · Bertrand Thirion · Gael Varoquaux 
2016 Poster: Learning brain regions via largescale online structured sparse dictionary learning »
Elvis DOHMATOB · Arthur Mensch · Gael Varoquaux · Bertrand Thirion 
2015 Poster: SemiSupervised Factored Logistic Regression for HighDimensional Neuroimaging Data »
Danilo Bzdok · Michael Eickenberg · Olivier Grisel · Bertrand Thirion · Gael Varoquaux 
2013 Poster: Mapping paradigm ontologies to and from the brain »
Yannick Schwartz · Bertrand Thirion · Gael Varoquaux