Timezone: »
Poster
Approximate Cross-Validation with Low-Rank Data in High Dimensions
Will Stephenson · Madeleine Udell · Tamara Broderick
Many recent advances in machine learning are driven by a challenging trifecta:
large data size $N$, high dimensions, and expensive algorithms. In this setting, cross-validation (CV) serves as an important tool for model assessment. Recent advances in approximate cross validation (ACV) provide accurate approximations to CV with only a single model fit, avoiding traditional CV's requirement for repeated runs of expensive algorithms. Unfortunately, these ACV methods can lose both speed and accuracy in high dimensions --- unless sparsity structure is present in the data. Fortunately, there is an alternative type of simplifying structure that is present in most data: approximate low rank (ALR).
Guided by this observation, we develop a new algorithm for ACV that is fast and accurate in the presence of ALR data.
Our first key insight is that the Hessian matrix --- whose inverse forms the computational bottleneck of existing ACV methods --- is ALR. We show that, despite our use of the \emph{inverse} Hessian, a low-rank approximation using the largest (rather than the smallest) matrix eigenvalues enables fast, reliable ACV.
Our second key insight is that, in the presence of ALR data, error in existing ACV methods roughly grows with the (approximate, low) rank rather than with the (full, high) dimension. These insights allow us to prove theoretical guarantees on the quality of our proposed algorithm --- along with fast-to-compute upper bounds on its error. We demonstrate the speed and accuracy of our method, as well as the usefulness of our bounds, on a range of real and simulated data sets.
Author Information
Will Stephenson (MIT)
Madeleine Udell (Cornell University)
Tamara Broderick (MIT)
More from the Same Authors
-
2021 : Measuring the sensitivity of Gaussian processes to kernel choice »
Will Stephenson · Soumya Ghosh · Tin Nguyen · Mikhail Yurochkin · Sameer Deshpande · Tamara Broderick -
2022 : Low Rank Approximation for Faster Convex Optimization »
Madeleine Udell -
2022 Poster: TabNAS: Rejection Sampling for Neural Architecture Search on Tabular Datasets »
Chengrun Yang · Gabriel Bender · Hanxiao Liu · Pieter-Jan Kindermans · Madeleine Udell · Yifeng Lu · Quoc V Le · Da Huang -
2022 Poster: Probabilistic Missing Value Imputation for Mixed Categorical and Ordered Data »
Yuxuan Zhao · Alex Townsend · Madeleine Udell -
2021 Workshop: Your Model is Wrong: Robustness and misspecification in probabilistic modeling »
Diana Cai · Sameer Deshpande · Michael Hughes · Tamara Broderick · Trevor Campbell · Nick Foti · Barbara Engelhardt · Sinead Williamson -
2021 Workshop: Learning Meaningful Representations of Life (LMRL) »
Elizabeth Wood · Adji Bousso Dieng · Aleksandrina Goeva · Anshul Kundaje · Barbara Engelhardt · Chang Liu · David Van Valen · Debora Marks · Edward Boyden · Eli N Weinstein · Lorin Crawford · Mor Nitzan · Romain Lopez · Tamara Broderick · Ray Jones · Wouter Boomsma · Yixin Wang -
2021 Poster: Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression »
Will Stephenson · Zachary Frangella · Madeleine Udell · Tamara Broderick -
2021 Poster: For high-dimensional hierarchical models, consider exchangeability of effects across covariates instead of across datasets »
Brian Trippe · Hilary Finucane · Tamara Broderick -
2020 : Panel & Closing »
Tamara Broderick · Laurent Dinh · Neil Lawrence · Kristian Lum · Hanna Wallach · Sinead Williamson -
2020 : Tamara Broderick »
Tamara Broderick -
2020 Poster: Matrix Completion with Quantified Uncertainty through Low Rank Gaussian Copula »
Yuxuan Zhao · Madeleine Udell -
2020 Poster: Approximate Cross-Validation for Structured Models »
Soumya Ghosh · Will Stephenson · Tin Nguyen · Sameer Deshpande · Tamara Broderick -
2019 : Break / Poster Session 1 »
Antonia Marcu · Yao-Yuan Yang · Pascale Gourdeau · Chen Zhu · Thodoris Lykouris · Jianfeng Chi · Mark Kozdoba · Arjun Nitin Bhagoji · Xiaoxia Wu · Jay Nandy · Michael T Smith · Bingyang Wen · Yuege Xie · Konstantinos Pitas · Suprosanna Shit · Maksym Andriushchenko · Dingli Yu · GaĆ«l Letarte · Misha Khodak · Hussein Mozannar · Chara Podimata · James Foulds · Yizhen Wang · Huishuai Zhang · Ondrej Kuzelka · Alexander Levine · Nan Lu · Zakaria Mhammedi · Paul Viallard · Diana Cai · Lovedeep Gondara · James Lucas · Yasaman Mahdaviyeh · Aristide Baratin · Rishi Bommasani · Alessandro Barp · Andrew Ilyas · Kaiwen Wu · Jens Behrmann · Omar Rivasplata · Amir Nazemi · Aditi Raghunathan · Will Stephenson · Sahil Singla · Akhil Gupta · YooJung Choi · Yannic Kilcher · Clare Lyle · Edoardo Manino · Andrew Bennett · Zhi Xu · Niladri Chatterji · Emre Barut · Flavien Prost · Rodrigo Toro Icarte · Arno Blaas · Chulhee Yun · Sahin Lale · YiDing Jiang · Tharun Kumar Reddy Medini · Ashkan Rezaei · Alexander Meinke · Stephen Mell · Gary Kazantsev · Shivam Garg · Aradhana Sinha · Vishnu Lokhande · Geovani Rizk · Han Zhao · Aditya Kumar Akash · Jikai Hou · Ali Ghodsi · Matthias Hein · Tyler Sypherd · Yichen Yang · Anastasia Pentina · Pierre Gillot · Antoine Ledent · Guy Gur-Ari · Noah MacAulay · Tianzong Zhang -
2019 Poster: Factor Group-Sparse Regularization for Efficient Low-Rank Matrix Recovery »
Jicong Fan · Lijun Ding · Yudong Chen · Madeleine Udell -
2018 : Panel: Explainability, Fairness and Human Aspects in Financial Services »
Madeleine Udell · Jiahao Chen · Nitzan Mekel-Bobrov · Manuela Veloso · Jon Kleinberg · Andrea Freeman · Samik Chandarana · Jacob Sisk · Michael McBurnett -
2018 : Invited Talk 1: Fairness and Causality with Missing Data »
Madeleine Udell -
2018 Workshop: All of Bayesian Nonparametrics (Especially the Useful Bits) »
Diana Cai · Trevor Campbell · Michael Hughes · Tamara Broderick · Nick Foti · Sinead Williamson -
2018 Poster: Limited Memory Kelley's Method Converges for Composite Convex and Submodular Objectives »
Song Zhou · Swati Gupta · Madeleine Udell -
2018 Spotlight: Limited Memory Kelley's Method Converges for Composite Convex and Submodular Objectives »
Song Zhou · Swati Gupta · Madeleine Udell -
2018 Poster: Causal Inference with Noisy and Missing Covariates via Matrix Factorization »
Nathan Kallus · Xiaojie Mao · Madeleine Udell -
2017 Workshop: Advances in Approximate Bayesian Inference »
Francisco Ruiz · Stephan Mandt · Cheng Zhang · James McInerney · James McInerney · Dustin Tran · Dustin Tran · David Blei · Max Welling · Tamara Broderick · Michalis Titsias -
2017 Poster: PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference »
Jonathan Huggins · Ryan Adams · Tamara Broderick -
2017 Spotlight: PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference »
Jonathan Huggins · Ryan Adams · Tamara Broderick -
2017 Poster: Fixed-Rank Approximation of a Positive-Semidefinite Matrix from Streaming Data »
Joel A Tropp · Alp Yurtsever · Madeleine Udell · Volkan Cevher -
2016 : Tamara Broderick: Foundations Talk »
Tamara Broderick -
2016 Workshop: Advances in Approximate Bayesian Inference »
Tamara Broderick · Stephan Mandt · James McInerney · Dustin Tran · David Blei · Kevin Murphy · Andrew Gelman · Michael I Jordan -
2016 Workshop: Practical Bayesian Nonparametrics »
Nick Foti · Tamara Broderick · Trevor Campbell · Michael Hughes · Jeffrey Miller · Aaron Schein · Sinead Williamson · Yanxun Xu -
2016 Poster: The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM »
Damek Davis · Brent Edmunds · Madeleine Udell -
2016 Poster: Coresets for Scalable Bayesian Logistic Regression »
Jonathan Huggins · Trevor Campbell · Tamara Broderick -
2016 Poster: Edge-exchangeable graphs and sparsity »
Diana Cai · Trevor Campbell · Tamara Broderick -
2015 Workshop: Bayesian Nonparametrics: The Next Generation »
Tamara Broderick · Nick Foti · Aaron Schein · Alex Tank · Hanna Wallach · Sinead Williamson -
2015 Workshop: Advances in Approximate Bayesian Inference »
Dustin Tran · Tamara Broderick · Stephan Mandt · James McInerney · Shakir Mohamed · Alp Kucukelbir · Matthew D. Hoffman · Neil Lawrence · David Blei -
2015 Poster: Linear Response Methods for Accurate Covariance Estimates from Mean Field Variational Bayes »
Ryan Giordano · Tamara Broderick · Michael Jordan -
2015 Spotlight: Linear Response Methods for Accurate Covariance Estimates from Mean Field Variational Bayes »
Ryan Giordano · Tamara Broderick · Michael Jordan -
2014 Workshop: Advances in Variational Inference »
David Blei · Shakir Mohamed · Michael Jordan · Charles Blundell · Tamara Broderick · Matthew D. Hoffman -
2013 Poster: Optimistic Concurrency Control for Distributed Unsupervised Learning »
Xinghao Pan · Joseph Gonzalez · Stefanie Jegelka · Tamara Broderick · Michael Jordan -
2013 Poster: Streaming Variational Bayes »
Tamara Broderick · Nicholas Boyd · Andre Wibisono · Ashia C Wilson · Michael Jordan