Timezone: »
Determinantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, recommendation, stochastic optimization, experimental design and more. Given a kernel function and a subset size k, our goal is to sample k out of n items with probability proportional to the determinant of the kernel matrix induced by the subset (a.k.a. kDPP). Existing kDPP sampling algorithms require an expensive preprocessing step which involves multiple passes over all n items, making it infeasible for large datasets. A naïve heuristic addressing this problem is to uniformly subsample a fraction of the data and perform kDPP sampling only on those items, however this method offers no guarantee that the produced sample will even approximately resemble the target distribution over the original dataset. In this paper, we develop alphaDPP, an algorithm which adaptively builds a sufficiently large uniform sample of data that is then used to efficiently generate a smaller set of k items, while ensuring that this set is drawn exactly from the target distribution defined on all n items. We show empirically that our algorithm produces a kDPP sample after observing only a small fraction of all elements, leading to several orders of magnitude faster performance compared to the stateoftheart. Our implementation of alphaDPP is provided at https://github.com/guilgautier/DPPy/.
Author Information
Daniele Calandriello (DeepMind)
Michal Derezinski (UC Berkeley)
Michal Valko (DeepMind)
Michal is a machine learning scientist in DeepMind Paris, SequeL team at Inria, and the lecturer of the master course Graphs in Machine Learning at l'ENS ParisSaclay. Michal is primarily interested in designing algorithms that would require as little human supervision as possible. This means 1) reducing the “intelligence” that humans need to input into the system and 2) minimizing the data that humans need to spend inspecting, classifying, or “tuning” the algorithms. Another important feature of machine learning algorithms should be the ability to adapt to changing environments. That is why he is working in domains that are able to deal with minimal feedback, such as online learning, bandit algorithms, semisupervised learning, and anomaly detection. Most recently he has worked on sequential algorithms with structured decisions where exploiting the structure leads to provably faster learning. Structured learning requires more time and space resources and therefore the most recent work of Michal includes efficient approximations such as graph and matrix sketching with learning guarantees. In past, the common thread of Michal's work has been adaptive graphbased learning and its application to realworld applications such as recommender systems, medical error detection, and face recognition. His industrial collaborators include Adobe, Intel, Technicolor, and Microsoft Research. He received his Ph.D. in 2011 from the University of Pittsburgh under the supervision of Miloš Hauskrecht and after was a postdoc of Rémi Munos before taking a permanent position at Inria in 2012.
Related Events (a corresponding poster, oral, or spotlight)

2020 Poster: Sampling from a kDPP without looking at all items »
Thu Dec 10th 05:00  07:00 PM Room Poster Session 5
More from the Same Authors

2021 Spotlight: NewtonLESS: Sparsification without Tradeoffs for the Sketched Newton Update »
Michal Derezinski · Jonathan Lacotte · Mert Pilanci · Michael Mahoney 
2021 : One Pass ImageNet »
Clara Huiyi Hu · Ang Li · Daniele Calandriello · Dilan Gorur 
2021 Poster: ParK: Sound and Efficient Kernel Ridge Regression by Feature Space Partitions »
Luigi Carratino · Stefano Vigogna · Daniele Calandriello · Lorenzo Rosasco 
2021 Poster: NewtonLESS: Sparsification without Tradeoffs for the Sketched Newton Update »
Michal Derezinski · Jonathan Lacotte · Mert Pilanci · Michael Mahoney 
2020 Poster: Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization »
Michal Derezinski · Burak Bartan · Mert Pilanci · Michael Mahoney 
2020 Poster: Bootstrap Your Own Latent  A New Approach to SelfSupervised Learning »
JeanBastien Grill · Florian Strub · Florent Altché · Corentin Tallec · Pierre Richemond · Elena Buchatskaya · Carl Doersch · Bernardo Avila Pires · Zhaohan Guo · Mohammad Gheshlaghi Azar · Bilal Piot · koray kavukcuoglu · Remi Munos · Michal Valko 
2020 Oral: Bootstrap Your Own Latent  A New Approach to SelfSupervised Learning »
JeanBastien Grill · Florian Strub · Florent Altché · Corentin Tallec · Pierre Richemond · Elena Buchatskaya · Carl Doersch · Bernardo Avila Pires · Zhaohan Guo · Mohammad Gheshlaghi Azar · Bilal Piot · koray kavukcuoglu · Remi Munos · Michal Valko 
2020 Poster: Exact expressions for double descent and implicit regularization via surrogate random design »
Michal Derezinski · Feynman Liang · Michael Mahoney 
2020 Poster: Improved guarantees and a multipledescent curve for Column Subset Selection and the Nystrom method »
Michal Derezinski · Rajiv Khanna · Michael Mahoney 
2020 Poster: Precise expressions for random projections: Lowrank approximation and randomized Newton »
Michal Derezinski · Feynman Liang · Zhenyu Liao · Michael Mahoney 
2020 Oral: Improved guarantees and a multipledescent curve for Column Subset Selection and the Nystrom method »
Michal Derezinski · Rajiv Khanna · Michael Mahoney 
2020 Poster: Statistical Efficiency of Thompson Sampling for Combinatorial SemiBandits »
Pierre Perrault · Etienne Boursier · Michal Valko · Vianney Perchet 
2020 Poster: Planning in Markov Decision Processes with GapDependent Sample Complexity »
Anders Jonsson · Emilie Kaufmann · Pierre Menard · Omar Darwiche Domingues · Edouard Leurent · Michal Valko 
2019 Poster: Distributed estimation of the inverse Hessian by determinantal averaging »
Michal Derezinski · Michael Mahoney 
2019 Poster: Exact sampling of determinantal point processes with sublinear time preprocessing »
Michal Derezinski · Daniele Calandriello · Michal Valko 
2018 Poster: On Fast Leverage Score Sampling and Optimal Learning »
Alessandro Rudi · Daniele Calandriello · Luigi Carratino · Lorenzo Rosasco 
2018 Poster: Statistical and Computational TradeOffs in Kernel KMeans »
Daniele Calandriello · Lorenzo Rosasco 
2018 Spotlight: Statistical and Computational TradeOffs in Kernel KMeans »
Daniele Calandriello · Lorenzo Rosasco 
2018 Poster: Leveraged volume sampling for linear regression »
Michal Derezinski · Manfred K. Warmuth · Daniel Hsu 
2018 Spotlight: Leveraged volume sampling for linear regression »
Michal Derezinski · Manfred K. Warmuth · Daniel Hsu 
2017 Poster: Efficient SecondOrder Online Kernel Learning with Adaptive Embedding »
Daniele Calandriello · Alessandro Lazaric · Michal Valko 
2017 Poster: Unbiased estimates for linear regression via volume sampling »
Michal Derezinski · Manfred K. Warmuth 
2017 Spotlight: Unbiased estimates for linear regression via volume sampling »
Michal Derezinski · Manfred K. Warmuth 
2014 Workshop: Second Workshop on Transfer and MultiTask Learning: Theory meets Practice »
Urun Dogan · Tatiana Tommasi · Yoshua Bengio · Francesco Orabona · Marius Kloft · Andres Munoz · Gunnar Rätsch · Hal Daumé III · Mehryar Mohri · Xuezhi Wang · Daniel Hernándezlobato · Song Liu · Thomas Unterthiner · Pascal Germain · Vinay P Namboodiri · Michael Goetz · Christopher Berlind · Sigurd Spieckermann · Marta Soare · Yujia Li · Vitaly Kuznetsov · Wenzhao Lian · Daniele Calandriello · Emilie Morvant 
2014 Poster: The limits of squared Euclidean distance regularization »
Michal Derezinski · Manfred K. Warmuth 
2014 Spotlight: The limits of squared Euclidean distance regularization »
Michal Derezinski · Manfred K. Warmuth 
2014 Poster: Sparse MultiTask Reinforcement Learning »
Daniele Calandriello · Alessandro Lazaric · Marcello Restelli