Skip to yearly menu bar Skip to main content

Workshop: Distribution shifts: connecting methods and applications (DistShift)

Kernel Landmarks: An Empirical Statistical Approach to Detect Covariate Shift

Yuksel Karahan · Bilal Riaz · Austin J Brockmeier


Training a predictive model with empirical risk minimization requires a distribution of the input training data that matches the testing data. Covariate shift can occur when the testing cases are not class-balanced, but the training is. In order to detect when class imbalance is present in a test sample (without labels), we propose to use statistical divergence based on the Wasserstein distance and optimal transport. Recently, slicing techniques have been proposed that provide computational and statistical advantages in high-dimensional spaces. In this work we presented a computationally simple approach to perform generalized slicing via kernel-based Wasserstein distance and apply it to as a two-sample test. The proposed landmark-based slicing chooses a single point in the samples to be the sole support vector to represent the witness function. We run pseudo-real experiments using the MNIST dataset and compare our method with maximum mean discrepancy (MMD). We have shown that our proposed methods perform better than MMD on these synthetic simulations of covariate shift.

Chat is not available.