Timezone: »
The notion of task similarity is at the core of various machine learning paradigms, such as domain adaptation and meta-learning. Current methods to quantify it are often heuristic, make strong assumptions on the label sets across the tasks, and many are architecture-dependent, relying on task-specific optimal parameters (e.g., require training a model on each dataset). In this work we propose an alternative notion of distance between datasets that (i) is model-agnostic, (ii) does not involve training, (iii) can compare datasets even if their label sets are completely disjoint and (iv) has solid theoretical footing. This distance relies on optimal transport, which provides it with rich geometry awareness, interpretable correspondences and well-understood properties. Our results show that this novel distance provides meaningful comparison of datasets, and correlates well with transfer learning hardness across various experimental settings and datasets.
Author Information
David Alvarez-Melis (Microsoft Research)
Nicolo Fusi (Microsoft Research)
More from the Same Authors
-
2021 : Optimizing Functionals on the Space of Probabilities with Input Convex Neural Network »
David Alvarez-Melis · Yair Schiff · Youssef Mroueh -
2021 : Optimizing Functionals on the Space of Probabilities with Input Convex Neural Network »
David Alvarez-Melis · Yair Schiff · Youssef Mroueh -
2019 : Poster session »
Jindong Gu · Alice Xiang · Atoosa Kasirzadeh · Zhiwei Han · Omar U. Florez · Frederik Harder · An-phi Nguyen · Amir Hossein Akhavan Rahnama · Michele Donini · Dylan Slack · Junaid Ali · Paramita Koley · Michiel Bakker · Anna Hilgard · Hailey James-Sorenson · Gonzalo Ramos · Jialin Lu · Jingying Yang · Margarita Boyarskaya · Martin Pawelczyk · Kacper Sokol · Mimansa Jaiswal · Umang Bhatt · David Alvarez-Melis · Aditya Grover · Charles Marx · Mengjiao Yang · Jingyan Wang · Gökhan Çapan · Hanchen Wang · Steffen Grünewälder · Moein Khajehnejad · Gourab Patro · Russell Kunes · Samuel Deng · Yuanting Liu · Luca Oneto · Mengze Li · Thomas Weber · Stefan Matthes · Duy Patrick Tu -
2018 : Poster spotlight #2 »
Nicolo Fusi · Chidubem Arachie · Joao Monteiro · Steffen Wolf -
2018 Poster: Gaussian Process Prior Variational Autoencoders »
Francesco Paolo Casale · Adrian Dalca · Luca Saglietti · Jennifer Listgarten · Nicolo Fusi -
2018 Poster: Probabilistic Matrix Factorization for Automated Machine Learning »
Nicolo Fusi · Rishit Sheth · Melih Elibol -
2018 Poster: Towards Robust Interpretability with Self-Explaining Neural Networks »
David Alvarez-Melis · Tommi Jaakkola -
2017 Workshop: Machine Learning in Computational Biology »
James Zou · Anshul Kundaje · Gerald Quon · Nicolo Fusi · Sara Mostafavi -
2017 : Structured Optimal Transport (with T. Jaakkola, S. Jegelka) »
David Alvarez-Melis -
2016 Workshop: Machine Learning in Computational Biology »
Gerald Quon · Sara Mostafavi · James Y Zou · Barbara Engelhardt · Oliver Stegle · Nicolo Fusi -
2015 Workshop: Machine Learning in Computational Biology »
Nicolo Fusi · Anna Goldenberg · Sara Mostafavi · Gerald Quon · Oliver Stegle