Timezone: »
Evaluating the inherent difficulty of a given data-driven classification problem is important for establishing absolute benchmarks and evaluating progress in the field. To this end, a natural quantity to consider is the \emph{Bayes error}, which measures the optimal classification error theoretically achievable for a given data distribution. While generally an intractable quantity, we show that we can compute the exact Bayes error of generative models learned using normalizing flows. Our technique relies on a fundamental result, which states that the Bayes error is invariant under invertible transformation. Therefore, we can compute the exact Bayes error of the learned flow models by computing it for Gaussian base distributions, which can be done efficiently using Holmes-Diaconis-Ross integration. Moreover, we show that by varying the temperature of the learned flow models, we can generate synthetic datasets that closely resemble standard benchmark datasets, but with almost any desired Bayes error. We use our approach to conduct a thorough investigation of state-of-the-art classification models, and find that in some --- but not all --- cases, these models are capable of obtaining accuracy very near optimal. Finally, we use our method to evaluate the intrinsic "hardness" of standard benchmark datasets.
Author Information
Ryan Theisen (University of California Berkeley)
Huan Wang (Salesforce Research)
Huan Wang is an senior research scientist at Salesforce Research. His research interests include machine learning, big data analytics, computer vision and NLP. He used to be a research scientist at Microsoft AI Research, Yahoo’s New York Labs, and an adjunct professor at the engineering school of New York University. He graduated as a Ph.D in Computer Science at Yale University in 2013. Before that, he received an M.Phil. from The Chinese University of Hong Kong and a B.Eng. from Zhejiang University, both in information engineering.
Lav Varshney (Salesforce Research)
Caiming Xiong (State Univerisity of New York at Buffalo)
Richard Socher (MetaMind)
More from the Same Authors
-
2021 Spotlight: Understanding the Under-Coverage Bias in Uncertainty Estimation »
Yu Bai · Song Mei · Huan Wang · Caiming Xiong -
2021 Spotlight: Align before Fuse: Vision and Language Representation Learning with Momentum Distillation »
Junnan Li · Ramprasaath Selvaraju · Akhilesh Gotmare · Shafiq Joty · Caiming Xiong · Steven Chu Hong Hoi -
2021 Spotlight: A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning »
Pan Zhou · Caiming Xiong · Xiaotong Yuan · Steven Chu Hong Hoi -
2021 : Advanced Methods for Connectome-Based Predictive Modeling of Human Intelligence: A Novel Approach Based on Individual Differences in Cortical Topography »
Evan Anderson · Anuj Nayak · Pablo Robles-Granda · Lav Varshney · Been Kim · Aron K Barbey -
2022 : Controllable Generation for Climate Modeling »
Moulik Choraria · Daniela Szwarcman · Bianca Zadrozny · Campbell Watson · Lav Varshney -
2021 : Balancing Robustness and Fairness via Partial Invariance »
Moulik Choraria · Ibtihal Ferwana · Ankur Mani · Lav Varshney -
2021 Poster: Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games »
Yu Bai · Chi Jin · Huan Wang · Caiming Xiong -
2021 Poster: Align before Fuse: Vision and Language Representation Learning with Momentum Distillation »
Junnan Li · Ramprasaath Selvaraju · Akhilesh Gotmare · Shafiq Joty · Caiming Xiong · Steven Chu Hong Hoi -
2021 Poster: Understanding the Under-Coverage Bias in Uncertainty Estimation »
Yu Bai · Song Mei · Huan Wang · Caiming Xiong -
2021 Poster: Taxonomizing local versus global structure in neural network loss landscapes »
Yaoqing Yang · Liam Hodgkinson · Ryan Theisen · Joe Zou · Joseph Gonzalez · Kannan Ramchandran · Michael Mahoney -
2021 Poster: A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning »
Pan Zhou · Caiming Xiong · Xiaotong Yuan · Steven Chu Hong Hoi -
2021 Poster: Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning »
Tengyang Xie · Nan Jiang · Huan Wang · Caiming Xiong · Yu Bai -
2020 : Contributed Talk - ProGen: Language Modeling for Protein Generation »
Ali Madani · Bryan McCann · Nikhil Naik · · Possu Huang · Richard Socher -
2020 Poster: Towards Understanding Hierarchical Learning: Benefits of Neural Representations »
Minshuo Chen · Yu Bai · Jason Lee · Tuo Zhao · Huan Wang · Caiming Xiong · Richard Socher -
2017 Poster: Learned in Translation: Contextualized Word Vectors »
Bryan McCann · James Bradbury · Caiming Xiong · Richard Socher -
2017 Poster: Probabilistic Rule Realization and Selection »
Haizi Yu · Tianxi Li · Lav Varshney -
2016 : Richard Socher - Tackling the Limits of Deep Learning for NLP »
Richard Socher -
2014 Poster: Global Belief Recursive Neural Networks »
Romain Paulus · Richard Socher · Christopher Manning -
2013 Demonstration: Easy Text Classification with Machine Learning »
Richard Socher · Romain Paulus · Bryan McCann · Andrew Y Ng -
2013 Poster: Reasoning With Neural Tensor Networks for Knowledge Base Completion »
Richard Socher · Danqi Chen · Christopher D Manning · Andrew Y Ng -
2013 Poster: Zero-Shot Learning Through Cross-Modal Transfer »
Richard Socher · Milind Ganjoo · Christopher D Manning · Andrew Y Ng -
2012 Poster: Recursive Deep Learning on 3D Point Clouds »
Richard Socher · Bharath Bath · Brody Huval · Christopher D Manning · Andrew Y Ng -
2011 Poster: Unfolding Recursive Autoencoders for Paraphrase Detection »
Richard Socher · Eric H Huang · Jeffrey Pennin · Andrew Y Ng · Christopher D Manning -
2009 Poster: A Bayesian Analysis of Dynamics in Free Recall »
Richard Socher · Samuel J Gershman · Adler Perotte · Per Sederberg · David Blei · Kenneth Norman