Timezone: »

A Fast, Consistent Kernel Two-Sample Test
Arthur Gretton · Kenji Fukumizu · Zaid Harchaoui · Bharath Sriperumbudur

Mon Dec 07 06:41 PM -- 06:42 PM (PST) @
A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if and only if P and Q coincide. In using this distance as a statistic for a test of whether two samples are from different distributions, a major difficulty arises in computing the significance threshold, since the empirical statistic has as its null distribution (where P=Q) an infinite weighted sum of $\chi^2$ random variables. The main result of the present work is a novel, consistent estimate of this null distribution, computed from the eigenspectrum of the Gram matrix on the aggregate sample from P and Q. This estimate may be computed faster than a previous consistent estimate based on the bootstrap. Another prior approach was to compute the null distribution based on fitting a parametric family with the low order moments of the test statistic: unlike the present work, this heuristic has no guarantee of being accurate or consistent. We verify the performance of our null distribution estimate on both an artificial example and on high dimensional multivariate data.

Author Information

Arthur Gretton (Google Deepmind / UCL)

Arthur Gretton is a Professor with the Gatsby Computational Neuroscience Unit at UCL. He received degrees in Physics and Systems Engineering from the Australian National University, and a PhD with Microsoft Research and the Signal Processing and Communications Laboratory at the University of Cambridge. He previously worked at the MPI for Biological Cybernetics, and at the Machine Learning Department, Carnegie Mellon University. Arthur's recent research interests in machine learning include the design and training of generative models, both implicit (e.g. GANs) and explicit (high/infinite dimensional exponential family models), nonparametric hypothesis testing, and kernel methods. He has been an associate editor at IEEE Transactions on Pattern Analysis and Machine Intelligence from 2009 to 2013, an Action Editor for JMLR since April 2013, an Area Chair for NeurIPS in 2008 and 2009, a Senior Area Chair for NeurIPS in 2018, an Area Chair for ICML in 2011 and 2012, and a member of the COLT Program Committee in 2013. Arthur was program chair for AISTATS in 2016 (with Christian Robert), tutorials chair for ICML 2018 (with Ruslan Salakhutdinov), workshops chair for ICML 2019 (with Honglak Lee), program chair for the Dali workshop in 2019 (with Krikamol Muandet and Shakir Mohammed), and co-organsier of the Machine Learning Summer School 2019 in London (with Marc Deisenroth).

Kenji Fukumizu (Institute of Statistical Mathematics / Preferred Networks / RIKEN AIP)
Zaid Harchaoui (University of Washington)
Bharath Sriperumbudur (The Pennsylvania State University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors