Timezone: »
The squared 2-Wasserstein distance is a natural loss to compare probability distributions in generative models or density fitting tasks thanks to its « informative » gradient, but this loss suffers from a poor sample and computational complexity compared to alternative losses such as kernel MMD. Adding an entropic regularization and debiaising the resulting quantity (yielding the Sinkhorn divergence) mitigates these downsides but also leads to a degradation of the discriminative power of the loss and of the quality of its gradients. In order to understand the trade-offs at play, we propose to study entropic regularization as one typically studies regularization in Machine Learning: by discussing the optimization, estimation and approximation errors, and their trade-offs, covering in passing a variety of recent works in the field. The analysis, complemented with numerical experiments, suggests that entropic regularization actually improves the quality and efficiency of the estimation of the squared 2-Wasserstein distance, compared to the plug-in (i.e unregularized) estimator.
Author Information
Lénaïc Chizat (CNRS)
More from the Same Authors
-
2020 Poster: Statistical and Topological Properties of Sliced Probability Divergences »
Kimia Nadjahi · Alain Durmus · Lénaïc Chizat · Soheil Kolouri · Shahin Shahrampour · Umut Simsekli -
2020 Poster: Faster Wasserstein Distance Estimation with the Sinkhorn Divergence »
Lénaïc Chizat · Pierre Roussillon · Flavien Léger · François-Xavier Vialard · Gabriel Peyré -
2020 Spotlight: Statistical and Topological Properties of Sliced Probability Divergences »
Kimia Nadjahi · Alain Durmus · Lénaïc Chizat · Soheil Kolouri · Shahin Shahrampour · Umut Simsekli -
2019 Poster: On Lazy Training in Differentiable Programming »
Lénaïc Chizat · Edouard Oyallon · Francis Bach -
2018 Poster: On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport »
Lénaïc Chizat · Francis Bach