Timezone: »
Designing networks capable of attaining better performance with an increased inference budget is important to facilitate generalization to harder problem instances. Recent efforts have shown promising results in this direction by making use of depth-wise recurrent networks. In this work, we reproduce the performance of the prior art using a broader class of architectures called equilibrium models, and find that stronger generalization performance on harder examples (which require more iterations of inference to get correct) strongly correlates with the path independence of the system—its ability to converge to the same attractor (or limit cycle) regardless of initialization, given enough computation. Experimental interventions made to promote path independence result in improved generalization on harder (and thus more compute-hungry) problem instances, while those that penalize it degrade this ability. Path independence analyses are also useful on a per-example basis: for equilibrium models that have good in-distribution performance, path independence on out-of-distribution samples strongly correlates with accuracy. Thus, considering equilibrium models and path independence jointly leads to a valuable new viewpoint under which we can study the generalization performance of these networks on hard problem instances.
Author Information
Cem Anil (University of Toronto)
I'm a first year PhD student at the University of Toronto and Vector Institute, supervised by Roger Grosse and Geoffrey Hinton.
Ashwini Pokle (Carnegie Mellon University)
Kaiqu Liang (Princeton University)
Johannes Treutlein (University of Toronto)
Yuhuai Wu (Google)
Shaojie Bai (Carnegie Mellon University)
J. Zico Kolter (Carnegie Mellon University / Bosch Center for AI)
Zico Kolter is an Assistant Professor in the School of Computer Science at Carnegie Mellon University, and also serves as Chief Scientist of AI Research for the Bosch Center for Artificial Intelligence. His work focuses on the intersection of machine learning and optimization, with a large focus on developing more robust, explainable, and rigorous methods in deep learning. In addition, he has worked on a number of application areas, highlighted by work on sustainability and smart energy systems. He is the recipient of the DARPA Young Faculty Award, and best paper awards at KDD, IJCAI, and PESGM.
Roger Grosse (University of Toronto)
More from the Same Authors
-
2020 : An adversarially robust approach to security-constrained optimal power flow »
Neeraj Vijay Bedmutha · Priya Donti · J. Zico Kolter -
2021 : Normative disagreement as a challenge for Cooperative AI »
Julian Stastny · Maxime Riché · Aleksandr Lyzhov · Johannes Treutlein · Allan Dafoe · Jesse Clifton -
2021 : Normative disagreement as a challenge for Cooperative AI »
Julian Stastny · Maxime Riché · Aleksandr Lyzhov · Johannes Treutlein · Allan Dafoe · Jesse Clifton -
2021 : Normative disagreement as a challenge for Cooperative AI »
Julian Stastny · Maxime Riché · Aleksandr Lyzhov · Johannes Treutlein · Allan Dafoe · Jesse Clifton -
2022 : Generative Posterior Networks for Approximately Bayesian Epistemic Uncertainty Estimation »
Melrose Roderick · Felix Berkenkamp · Fatemeh Sheikholeslami · J. Zico Kolter -
2022 : Denoised Smoothing with Sample Rejection for Robustifying Pretrained Classifiers »
Fatemeh Sheikholeslami · Wan-Yi Lin · Jan Hendrik Metzen · Huan Zhang · J. Zico Kolter -
2022 : A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games »
Samuel Sokota · Ryan D'Orazio · J. Zico Kolter · Nicolas Loizou · Marc Lanctot · Ioannis Mitliagkas · Noam Brown · Christian Kroer -
2022 : Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search »
Michał Zawalski · Michał Tyrolski · Konrad Czechowski · Damian Stachura · Piotr Piękos · Tomasz Odrzygóźdź · Yuhuai Wu · Łukasz Kuciński · Piotr Miłoś -
2022 : Uncertainty-Driven Exploration for Generalization in Reinforcement Learning »
Yiding Jiang · J. Zico Kolter · Roberta Raileanu -
2022 : Improving Adversarial Robustness via Joint Classification and Multiple Explicit Detection Classes »
Sina Baharlouei · Fatemeh Sheikholeslami · Meisam Razaviyayn · J. Zico Kolter -
2023 Poster: On the Importance of Exploration for Generalization in Reinforcement Learning »
Yiding Jiang · J. Zico Kolter · Roberta Raileanu -
2023 Poster: Deep Equilibrium Based Neural Operators for Steady-State PDEs »
Tanya Marwah · Ashwini Pokle · J. Zico Kolter · Zachary Lipton · Jianfeng Lu · Andrej Risteski -
2023 Poster: Learning with Explanation Constraints »
Rattana Pukdee · Dylan Sam · J. Zico Kolter · Maria-Florina Balcan · Pradeep Ravikumar -
2023 Poster: Similarity-based cooperative equilibrium »
Caspar Oesterheld · Johannes Treutlein · Roger Grosse · Vincent Conitzer · Jakob Foerster -
2023 Poster: Permutation Equivariant Neural Functionals »
Allan Zhou · Kaien Yang · Kaylee Burns · Adriano Cardace · Yiding Jiang · Samuel Sokota · J. Zico Kolter · Chelsea Finn -
2023 Poster: One-Step Diffusion Distillation via Deep Equilibrium Models »
Zhengyang Geng · Ashwini Pokle · J. Zico Kolter -
2023 Poster: Neural Functional Transformers »
Allan Zhou · Kaien Yang · Yiding Jiang · Kaylee Burns · Winnie Xu · Samuel Sokota · J. Zico Kolter · Chelsea Finn -
2023 Poster: Provably Bounding Neural Network Preimages »
Christopher Brix · Suhas Kotha · Huan Zhang · J. Zico Kolter · Krishnamurthy Dvijotham -
2023 Poster: Language Models are Weak Learners »
Hariharan Manikandan · Yiding Jiang · J. Zico Kolter -
2023 Workshop: XAI in Action: Past, Present, and Future Applications »
Chhavi Yadav · Michal Moshkovitz · Nave Frost · Suraj Srinivas · Bingqing Chen · Valentyn Boreiko · Himabindu Lakkaraju · J. Zico Kolter · Dotan Di Castro · Kamalika Chaudhuri -
2022 Workshop: Trustworthy and Socially Responsible Machine Learning »
Huan Zhang · Linyi Li · Chaowei Xiao · J. Zico Kolter · Anima Anandkumar · Bo Li -
2022 Panel: Panel 2B-4: Extreme Compression for… & Exploring Length Generalization… »
Cem Anil · Minjia Zhang -
2022 : Zico Kolter, Adapt like you train: How optimization at training time affects model finetuning and adaptation »
J. Zico Kolter -
2022 Workshop: MATH-AI: Toward Human-Level Mathematical Reasoning »
Pan Lu · Swaroop Mishra · Sean Welleck · Yuhuai Wu · Hannaneh Hajishirzi · Percy Liang -
2022 Poster: Characterizing Datapoints via Second-Split Forgetting »
Pratyush Maini · Saurabh Garg · Zachary Lipton · J. Zico Kolter -
2022 Poster: Autoformalization with Large Language Models »
Yuhuai Wu · Albert Q. Jiang · Wenda Li · Markus Rabe · Charles Staats · Mateja Jamnik · Christian Szegedy -
2022 Poster: Amortized Proximal Optimization »
Juhan Bae · Paul Vicol · Jeff Z. HaoChen · Roger Grosse -
2022 Poster: Learning Options via Compression »
Yiding Jiang · Evan Liu · Benjamin Eysenbach · J. Zico Kolter · Chelsea Finn -
2022 Poster: Insights into Pre-training via Simpler Synthetic Tasks »
Yuhuai Wu · Felix Li · Percy Liang -
2022 Poster: Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers »
Albert Q. Jiang · Wenda Li · Szymon Tworkowski · Konrad Czechowski · Tomasz Odrzygóźdź · Piotr Miłoś · Yuhuai Wu · Mateja Jamnik -
2022 Poster: Proximal Learning With Opponent-Learning Awareness »
Stephen Zhao · Chris Lu · Roger Grosse · Jakob Foerster -
2022 Poster: If Influence Functions are the Answer, Then What is the Question? »
Juhan Bae · Nathan Ng · Alston Lo · Marzyeh Ghassemi · Roger Grosse -
2022 Poster: Efficiently Computing Local Lipschitz Constants of Neural Networks via Bound Propagation »
Zhouxing Shi · Yihan Wang · Huan Zhang · J. Zico Kolter · Cho-Jui Hsieh -
2022 Poster: STaR: Bootstrapping Reasoning With Reasoning »
Eric Zelikman · Yuhuai Wu · Jesse Mu · Noah Goodman -
2022 Poster: Test Time Adaptation via Conjugate Pseudo-labels »
Sachin Goyal · Mingjie Sun · Aditi Raghunathan · J. Zico Kolter -
2022 Poster: Exploring Length Generalization in Large Language Models »
Cem Anil · Yuhuai Wu · Anders Andreassen · Aitor Lewkowycz · Vedant Misra · Vinay Ramasesh · Ambrose Slone · Guy Gur-Ari · Ethan Dyer · Behnam Neyshabur -
2022 Poster: Deep Equilibrium Approaches to Diffusion Models »
Ashwini Pokle · Zhengyang Geng · J. Zico Kolter -
2022 Poster: Agreement-on-the-line: Predicting the Performance of Neural Networks under Distribution Shift »
Christina Baek · Yiding Jiang · Aditi Raghunathan · J. Zico Kolter -
2022 Poster: Solving Quantitative Reasoning Problems with Language Models »
Aitor Lewkowycz · Anders Andreassen · David Dohan · Ethan Dyer · Henryk Michalewski · Vinay Ramasesh · Ambrose Slone · Cem Anil · Imanol Schlag · Theo Gutman-Solo · Yuhuai Wu · Behnam Neyshabur · Guy Gur-Ari · Vedant Misra -
2022 Poster: General Cutting Planes for Bound-Propagation-Based Neural Network Verification »
Huan Zhang · Shiqi Wang · Kaidi Xu · Linyi Li · Bo Li · Suman Jana · Cho-Jui Hsieh · J. Zico Kolter -
2022 Poster: Block-Recurrent Transformers »
DeLesley Hutchins · Imanol Schlag · Yuhuai Wu · Ethan Dyer · Behnam Neyshabur -
2022 Poster: The Pitfalls of Regularization in Off-Policy TD Learning »
Gaurav Manek · J. Zico Kolter -
2021 : Panel B: Safe Learning and Decision Making in Uncertain and Unstructured Environments »
Yisong Yue · J. Zico Kolter · Ivan Dario D Jimenez Rodriguez · Dragos Margineantu · Animesh Garg · Melissa Greeff -
2021 : Enforcing Robustness for Neural Network Policies »
J. Zico Kolter -
2021 Poster: On Training Implicit Models »
Zhengyang Geng · Xin-Yu Zhang · Shaojie Bai · Yisen Wang · Zhouchen Lin -
2021 Poster: Beta-CROWN: Efficient Bound Propagation with Per-neuron Split Constraints for Neural Network Robustness Verification »
Shiqi Wang · Huan Zhang · Kaidi Xu · Xue Lin · Suman Jana · Cho-Jui Hsieh · J. Zico Kolter -
2021 Poster: Joint inference and input optimization in equilibrium networks »
Swaminathan Gurumurthy · Shaojie Bai · Zachary Manchester · J. Zico Kolter -
2021 Poster: $(\textrm{Implicit})^2$: Implicit Layers for Implicit Representations »
Zhichun Huang · Shaojie Bai · J. Zico Kolter -
2021 Poster: Boosted CVaR Classification »
Runtian Zhai · Chen Dan · Arun Suggala · J. Zico Kolter · Pradeep Ravikumar -
2021 Poster: Training Certifiably Robust Neural Networks with Efficient Local Lipschitz Bounds »
Yujia Huang · Huan Zhang · Yuanyuan Shi · J. Zico Kolter · Anima Anandkumar -
2021 Poster: Learning to Elect »
Cem Anil · Xuchan Bao -
2021 Poster: Adversarially robust learning for security-constrained optimal power flow »
Priya Donti · Aayushya Agarwal · Neeraj Vijay Bedmutha · Larry Pileggi · J. Zico Kolter -
2021 Poster: Robustness between the worst and average case »
Leslie Rice · Anna Bair · Huan Zhang · J. Zico Kolter -
2021 Poster: Monte Carlo Tree Search With Iteratively Refining State Abstractions »
Samuel Sokota · Caleb Y Ho · Zaheen Ahmad · J. Zico Kolter -
2021 Poster: Differentiable Annealed Importance Sampling and the Perils of Gradient Noise »
Guodong Zhang · Kyle Hsu · Jianing Li · Chelsea Finn · Roger Grosse -
2020 : Invited Talk (Zico Kolter) »
J. Zico Kolter -
2020 : Invited Talk: Roger Grosse - Why Isn’t Everyone Using Second-Order Optimization? »
Roger Grosse -
2020 Workshop: Machine Learning for Engineering Modeling, Simulation and Design »
Alex Beatson · Priya Donti · Amira Abdel-Rahman · Stephan Hoyer · Rose Yu · J. Zico Kolter · Ryan Adams -
2020 : Keynote by Zico Kolter »
J. Zico Kolter -
2020 Poster: Community detection using fast low-cardinality semidefinite programming
»
Po-Wei Wang · J. Zico Kolter -
2020 Poster: Deep Archimedean Copulas »
Chun Kai Ling · Fei Fang · J. Zico Kolter -
2020 Poster: Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians »
Juhan Bae · Roger Grosse -
2020 Tutorial: (Track3) Deep Implicit Layers: Neural ODEs, Equilibrium Models, and Differentiable Optimization Q&A »
David Duvenaud · J. Zico Kolter · Matthew Johnson -
2020 Poster: Regularized linear autoencoders recover the principal components, eventually »
Xuchan Bao · James Lucas · Sushant Sachdeva · Roger Grosse -
2020 Poster: Efficient semidefinite-programming-based inference for binary and multi-class MRFs »
Chirag Pabbaraju · Po-Wei Wang · J. Zico Kolter -
2020 Spotlight: Efficient semidefinite-programming-based inference for binary and multi-class MRFs »
Chirag Pabbaraju · Po-Wei Wang · J. Zico Kolter -
2020 Poster: Multiscale Deep Equilibrium Models »
Shaojie Bai · Vladlen Koltun · J. Zico Kolter -
2020 Poster: Denoised Smoothing: A Provable Defense for Pretrained Classifiers »
Hadi Salman · Mingjie Sun · Greg Yang · Ashish Kapoor · J. Zico Kolter -
2020 Poster: Monotone operator equilibrium networks »
Ezra Winston · J. Zico Kolter -
2020 Spotlight: Monotone operator equilibrium networks »
Ezra Winston · J. Zico Kolter -
2020 Oral: Multiscale Deep Equilibrium Models »
Shaojie Bai · Vladlen Koltun · J. Zico Kolter -
2020 Tutorial: (Track3) Deep Implicit Layers: Neural ODEs, Equilibrium Models, and Differentiable Optimization »
David Duvenaud · J. Zico Kolter · Matthew Johnson -
2019 Poster: Fast Convergence of Natural Gradient Descent for Over-Parameterized Neural Networks »
Guodong Zhang · James Martens · Roger Grosse -
2019 Poster: Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model »
Guodong Zhang · Lala Li · Zachary Nado · James Martens · Sushant Sachdeva · George Dahl · Chris Shallue · Roger Grosse -
2019 Poster: Learning Stable Deep Dynamics Models »
J. Zico Kolter · Gaurav Manek -
2019 Poster: Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks »
Qiyang Li · Saminul Haque · Cem Anil · James Lucas · Roger Grosse · Joern-Henrik Jacobsen -
2019 Poster: Adversarial Music: Real world Audio Adversary against Wake-word Detection System »
Juncheng Li · Shuhui Qu · Xinjian Li · Joseph Szurley · J. Zico Kolter · Florian Metze -
2019 Spotlight: Adversarial Music: Real world Audio Adversary against Wake-word Detection System »
Juncheng Li · Shuhui Qu · Xinjian Li · Joseph Szurley · J. Zico Kolter · Florian Metze -
2019 Poster: Differentiable Convex Optimization Layers »
Akshay Agrawal · Brandon Amos · Shane Barratt · Stephen Boyd · Steven Diamond · J. Zico Kolter -
2019 Poster: Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse »
James Lucas · George Tucker · Roger Grosse · Mohammad Norouzi -
2019 Poster: Uniform convergence may be unable to explain generalization in deep learning »
Vaishnavh Nagarajan · J. Zico Kolter -
2019 Poster: Deep Equilibrium Models »
Shaojie Bai · J. Zico Kolter · Vladlen Koltun -
2019 Spotlight: Deep Equilibrium Models »
Shaojie Bai · J. Zico Kolter · Vladlen Koltun -
2019 Oral: Uniform convergence may be unable to explain generalization in deep learning »
Vaishnavh Nagarajan · J. Zico Kolter -
2018 : TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer »
Sicong (Sheldon) Huang · Cem Anil · Xuchan Bao -
2018 : Talk 1: Zico Kolter - Differentiable Physics and Control »
J. Zico Kolter -
2018 Poster: Differentiable MPC for End-to-end Planning and Control »
Brandon Amos · Ivan Jimenez · Jacob I Sacks · Byron Boots · J. Zico Kolter -
2018 Poster: Isolating Sources of Disentanglement in Variational Autoencoders »
Tian Qi Chen · Xuechen (Chen) Li · Roger Grosse · David Duvenaud -
2018 Oral: Isolating Sources of Disentanglement in Variational Autoencoders »
Tian Qi Chen · Xuechen (Chen) Li · Roger Grosse · David Duvenaud -
2018 Poster: End-to-End Differentiable Physics for Learning and Control »
Filipe de Avila Belbute Peres · Kevin Smith · Kelsey Allen · Josh Tenenbaum · J. Zico Kolter -
2018 Spotlight: End-to-End Differentiable Physics for Learning and Control »
Filipe de Avila Belbute Peres · Kevin Smith · Kelsey Allen · Josh Tenenbaum · J. Zico Kolter -
2018 Poster: Scaling provable adversarial defenses »
Eric Wong · Frank Schmidt · Jan Hendrik Metzen · J. Zico Kolter -
2018 Poster: Reversible Recurrent Neural Networks »
Matthew MacKay · Paul Vicol · Jimmy Ba · Roger Grosse -
2018 Tutorial: Adversarial Robustness: Theory and Practice »
J. Zico Kolter · Aleksander Madry -
2017 : Provable defenses against adversarial examples via the convex outer adversarial polytope »
J. Zico Kolter -
2017 Poster: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation »
Yuhuai Wu · Elman Mansimov · Roger Grosse · Shun Liao · Jimmy Ba -
2017 Poster: Gradient descent GAN optimization is locally stable »
Vaishnavh Nagarajan · J. Zico Kolter -
2017 Spotlight: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation »
Yuhuai Wu · Elman Mansimov · Roger Grosse · Shun Liao · Jimmy Ba -
2017 Oral: Gradient descent GAN optimization is locally stable »
Vaishnavh Nagarajan · J. Zico Kolter -
2017 Poster: The Reversible Residual Network: Backpropagation Without Storing Activations »
Aidan Gomez · Mengye Ren · Raquel Urtasun · Roger Grosse -
2017 Poster: Task-based End-to-end Model Learning in Stochastic Optimization »
Priya Donti · J. Zico Kolter · Brandon Amos -
2016 Symposium: Deep Learning Symposium »
Yoshua Bengio · Yann LeCun · Navdeep Jaitly · Roger Grosse -
2016 Poster: The Multiple Quantile Graphical Model »
Alnur Ali · J. Zico Kolter · Ryan Tibshirani -
2016 Poster: Measuring the reliability of MCMC inference with bidirectional Monte Carlo »
Roger Grosse · Siddharth Ancha · Daniel Roy -
2015 Poster: Learning Wake-Sleep Recurrent Attention Models »
Jimmy Ba · Russ Salakhutdinov · Roger Grosse · Brendan J Frey -
2015 Spotlight: Learning Wake-Sleep Recurrent Attention Models »
Jimmy Ba · Russ Salakhutdinov · Roger Grosse · Brendan J Frey -
2013 Workshop: Machine Learning for Sustainability »
Edwin Bonilla · Thomas Dietterich · Theodoros Damoulas · Andreas Krause · Daniel Sheldon · Iadine Chades · J. Zico Kolter · Bistra Dilkina · Carla Gomes · Hugo P Simao -
2013 Poster: Annealing between distributions by averaging moments »
Roger Grosse · Chris Maddison · Russ Salakhutdinov -
2013 Oral: Annealing between distributions by averaging moments »
Roger Grosse · Chris Maddison · Russ Salakhutdinov -
2011 Workshop: Machine Learning for Sustainability »
Thomas Dietterich · J. Zico Kolter · Matthew A Brown -
2011 Poster: The Fixed Points of Off-Policy TD »
J. Zico Kolter -
2011 Spotlight: The Fixed Points of Off-Policy TD »
J. Zico Kolter -
2010 Poster: Energy Disaggregation via Discriminative Sparse Coding »
J. Zico Kolter · Siddarth Batra · Andrew Y Ng -
2009 Mini Symposium: Machine Learning for Sustainability »
J. Zico Kolter · Thomas Dietterich · Andrew Y Ng -
2007 Spotlight: Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion »
J. Zico Kolter · Pieter Abbeel · Andrew Y Ng -
2007 Poster: Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion »
J. Zico Kolter · Pieter Abbeel · Andrew Y Ng