Timezone: »
Modern neural network architectures can leverage large amounts of data to generalize well within the training distribution. However, they are less capable of systematic generalization to data drawn from unseen but related distributions, a feat that is hypothesized to require compositional reasoning and reuse of knowledge. In this work, we present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules, which we call functions. Inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. The proposed architecture can flexibly compose computation along width and depth, and lends itself well to capacity extension after training. To demonstrate the versatility of Neural Interpreters, we evaluate it in two distinct settings: image classification and visual abstract reasoning on Raven Progressive Matrices. In the former, we show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner. In the latter, we find that Neural Interpreters are competitive with respect to the state-of-the-art in terms of systematic generalization.
Author Information
Nasim Rahaman (Max Planck Institute for Intelligent Systems, Max-Planck Institute)
Muhammad Waleed Gondal (Max Planck Institute for Intelligent Systems)
Shruti Joshi (Mila, ETS Montreal)
Peter Gehler (Amazon)
Yoshua Bengio (Mila / U. Montreal)
Yoshua Bengio (PhD'1991 in Computer Science, McGill University). After two post-doctoral years, one at MIT with Michael Jordan and one at AT&T Bell Laboratories with Yann LeCun, he became professor at the department of computer science and operations research at Université de Montréal. Author of two books (a third is in preparation) and more than 200 publications, he is among the most cited Canadian computer scientists and is or has been associate editor of the top journals in machine learning and neural networks. Since '2000 he holds a Canada Research Chair in Statistical Learning Algorithms, since '2006 an NSERC Chair, since '2005 his is a Senior Fellow of the Canadian Institute for Advanced Research and since 2014 he co-directs its program focused on deep learning. He is on the board of the NIPS foundation and has been program chair and general chair for NIPS. He has co-organized the Learning Workshop for 14 years and co-created the International Conference on Learning Representations. His interests are centered around a quest for AI through machine learning, and include fundamental questions on deep learning, representation learning, the geometry of generalization in high-dimensional spaces, manifold learning and biologically inspired learning algorithms.
Francesco Locatello (Amazon)
Bernhard Schölkopf (MPI for Intelligent Systems, Tübingen)
More from the Same Authors
-
2021 Spotlight: Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization »
Kartik Ahuja · Ethan Caballero · Dinghuai Zhang · Jean-Christophe Gagnon-Audet · Yoshua Bengio · Ioannis Mitliagkas · Irina Rish -
2021 Spotlight: Iterative Teaching by Label Synthesis »
Weiyang Liu · Zhen Liu · Hanchen Wang · Liam Paull · Bernhard Schölkopf · Adrian Weller -
2021 Spotlight: DiBS: Differentiable Bayesian Structure Learning »
Lars Lorch · Jonas Rothfuss · Bernhard Schölkopf · Andreas Krause -
2021 : Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning »
Nan Rosemary Ke · Aniket Didolkar · Sarthak Mittal · Anirudh Goyal · Guillaume Lajoie · Stefan Bauer · Danilo Jimenez Rezende · Yoshua Bengio · Chris Pal · Michael Mozer -
2021 : Learning Robust Dynamics through Variational Sparse Gating »
Arnav Kumar Jain · Shivakanth Sujit · Shruti Joshi · Vincent Michalski · Danijar Hafner · Samira Ebrahimi Kahou -
2021 : Long-Term Credit Assignment via Model-based Temporal Shortcuts »
Michel Ma · Pierluca D'Oro · Yoshua Bengio · Pierre-Luc Bacon -
2021 : A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning »
Mingde Zhao · Zhen Liu · Sitao Luan · Shuyuan Zhang · Doina Precup · Yoshua Bengio -
2021 : Effect of diversity in Meta-Learning »
Ramnath Kumar · Tristan Deleu · Yoshua Bengio -
2022 : A Causal Framework to Quantify Robustness of Mathematical Reasoning with Language Models »
Alessandro Stolfo · Zhijing Jin · Kumar Shridhar · Bernhard Schölkopf · Mrinmaya Sachan -
2022 : A General-Purpose Neural Architecture for Geospatial Systems »
Martin Weiss · Nasim Rahaman · Frederik Träuble · Francesco Locatello · Alexandre Lacoste · Yoshua Bengio · Erran Li Li · Chris Pal · Bernhard Schölkopf -
2022 Poster: Neural Attentive Circuits »
Martin Weiss · Nasim Rahaman · Francesco Locatello · Chris Pal · Yoshua Bengio · Bernhard Schölkopf · Erran Li Li · Nicolas Ballas -
2021 : Boxhead: A Dataset for Learning Hierarchical Representations »
Yukun Chen · Andrea Dittadi · Frederik Träuble · Stefan Bauer · Bernhard Schölkopf -
2021 Poster: Gradient Starvation: A Learning Proclivity in Neural Networks »
Mohammad Pezeshki · Oumar Kaba · Yoshua Bengio · Aaron Courville · Doina Precup · Guillaume Lajoie -
2021 Poster: Causal Influence Detection for Improving Efficiency in Reinforcement Learning »
Maximilian Seitzer · Bernhard Schölkopf · Georg Martius -
2021 Poster: Independent mechanism analysis, a new concept? »
Luigi Gresele · Julius von Kügelgen · Vincent Stimper · Bernhard Schölkopf · Michel Besserve -
2021 Poster: A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning »
Mingde Zhao · Zhen Liu · Sitao Luan · Shuyuan Zhang · Doina Precup · Yoshua Bengio -
2021 Poster: Neural Production Systems »
Anirudh Goyal · Aniket Didolkar · Nan Rosemary Ke · Charles Blundell · Philippe Beaudoin · Nicolas Heess · Michael Mozer · Yoshua Bengio -
2021 Poster: Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation »
Emmanuel Bengio · Moksh Jain · Maksym Korablyov · Doina Precup · Yoshua Bengio -
2021 Poster: Iterative Teaching by Label Synthesis »
Weiyang Liu · Zhen Liu · Hanchen Wang · Liam Paull · Bernhard Schölkopf · Adrian Weller -
2021 Poster: The Causal-Neural Connection: Expressiveness, Learnability, and Inference »
Kevin Xia · Kai-Zhan Lee · Yoshua Bengio · Elias Bareinboim -
2021 Poster: Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization »
Kartik Ahuja · Ethan Caballero · Dinghuai Zhang · Jean-Christophe Gagnon-Audet · Yoshua Bengio · Ioannis Mitliagkas · Irina Rish -
2021 Poster: Discrete-Valued Neural Communication »
Dianbo Liu · Alex Lamb · Kenji Kawaguchi · Anirudh Goyal · Chen Sun · Michael Mozer · Yoshua Bengio -
2021 Poster: The Inductive Bias of Quantum Kernels »
Jonas Kübler · Simon Buchholz · Bernhard Schölkopf -
2021 Poster: Backward-Compatible Prediction Updates: A Probabilistic Approach »
Frederik Träuble · Julius von Kügelgen · Matthäus Kleindessner · Francesco Locatello · Bernhard Schölkopf · Peter Gehler -
2021 Poster: Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style »
Julius von Kügelgen · Yash Sharma · Luigi Gresele · Wieland Brendel · Bernhard Schölkopf · Michel Besserve · Francesco Locatello -
2021 Poster: DiBS: Differentiable Bayesian Structure Learning »
Lars Lorch · Jonas Rothfuss · Bernhard Schölkopf · Andreas Krause -
2021 Poster: Regret Bounds for Gaussian-Process Optimization in Large Domains »
Manuel Wuethrich · Bernhard Schölkopf · Andreas Krause -
2019 : Bernhard Schölkopf »
Bernhard Schölkopf -
2019 Poster: On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset »
Muhammad Waleed Gondal · Manuel Wuethrich · Djordje Miladinovic · Francesco Locatello · Martin Breidt · Valentin Volchkov · Joel Akpo · Olivier Bachem · Bernhard Schölkopf · Stefan Bauer -
2018 : Learning Independent Mechanisms »
Bernhard Schölkopf -
2017 : From deep learning of disentangled representations to higher-level cognition »
Yoshua Bengio -
2015 Tutorial: Deep Learning »
Geoffrey E Hinton · Yoshua Bengio · Yann LeCun -
2014 Workshop: Second Workshop on Transfer and Multi-Task Learning: Theory meets Practice »
Urun Dogan · Tatiana Tommasi · Yoshua Bengio · Francesco Orabona · Marius Kloft · Andres Munoz · Gunnar Rätsch · Hal Daumé III · Mehryar Mohri · Xuezhi Wang · Daniel Hernández-lobato · Song Liu · Thomas Unterthiner · Pascal Germain · Vinay P Namboodiri · Michael Goetz · Christopher Berlind · Sigurd Spieckermann · Marta Soare · Yujia Li · Vitaly Kuznetsov · Wenzhao Lian · Daniele Calandriello · Emilie Morvant -
2014 Workshop: Deep Learning and Representation Learning »
Andrew Y Ng · Yoshua Bengio · Adam Coates · Roland Memisevic · Sharanyan Chetlur · Geoffrey E Hinton · Shamim Nemati · Bryan Catanzaro · Surya Ganguli · Herbert Jaeger · Phil Blunsom · Leon Bottou · Volodymyr Mnih · Chen-Yu Lee · Rich M Schwartz -
2014 Workshop: OPT2014: Optimization for Machine Learning »
Zaid Harchaoui · Suvrit Sra · Alekh Agarwal · Martin Jaggi · Miro Dudik · Aaditya Ramdas · Jean Lasserre · Yoshua Bengio · Amir Beck -
2014 Poster: How transferable are features in deep neural networks? »
Jason Yosinski · Jeff Clune · Yoshua Bengio · Hod Lipson -
2014 Poster: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization »
Yann N Dauphin · Razvan Pascanu · Caglar Gulcehre · Kyunghyun Cho · Surya Ganguli · Yoshua Bengio -
2014 Poster: Generative Adversarial Nets »
Ian Goodfellow · Jean Pouget-Abadie · Mehdi Mirza · Bing Xu · David Warde-Farley · Sherjil Ozair · Aaron Courville · Yoshua Bengio -
2014 Poster: On the Number of Linear Regions of Deep Neural Networks »
Guido F Montufar · Razvan Pascanu · Kyunghyun Cho · Yoshua Bengio -
2014 Demonstration: Neural Machine Translation »
Bart van Merriënboer · Kyunghyun Cho · Dzmitry Bahdanau · Yoshua Bengio -
2014 Oral: How transferable are features in deep neural networks? »
Jason Yosinski · Jeff Clune · Yoshua Bengio · Hod Lipson -
2014 Poster: Iterative Neural Autoregressive Distribution Estimator NADE-k »
Tapani Raiko · Yao Li · Kyunghyun Cho · Yoshua Bengio -
2013 Workshop: Deep Learning »
Yoshua Bengio · Hugo Larochelle · Russ Salakhutdinov · Tomas Mikolov · Matthew D Zeiler · David Mcallester · Nando de Freitas · Josh Tenenbaum · Jian Zhou · Volodymyr Mnih -
2013 Workshop: Output Representation Learning »
Yuhong Guo · Dale Schuurmans · Richard Zemel · Samy Bengio · Yoshua Bengio · Li Deng · Dan Roth · Kilian Q Weinberger · Jason Weston · Kihyuk Sohn · Florent Perronnin · Gabriel Synnaeve · Pablo R Strasser · julien audiffren · Carlo Ciliberto · Dan Goldwasser -
2013 Poster: Multi-Prediction Deep Boltzmann Machines »
Ian Goodfellow · Mehdi Mirza · Aaron Courville · Yoshua Bengio -
2013 Poster: Generalized Denoising Auto-Encoders as Generative Models »
Yoshua Bengio · Li Yao · Guillaume Alain · Pascal Vincent -
2013 Poster: Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs »
Yann Dauphin · Yoshua Bengio -
2012 Workshop: Deep Learning and Unsupervised Feature Learning »
Yoshua Bengio · James Bergstra · Quoc V. Le -
2011 Workshop: Big Learning: Algorithms, Systems, and Tools for Learning at Scale »
Joseph E Gonzalez · Sameer Singh · Graham Taylor · James Bergstra · Alice Zheng · Misha Bilenko · Yucheng Low · Yoshua Bengio · Michael Franklin · Carlos Guestrin · Andrew McCallum · Alexander Smola · Michael Jordan · Sugato Basu -
2011 Workshop: Deep Learning and Unsupervised Feature Learning »
Yoshua Bengio · Adam Coates · Yann LeCun · Nicolas Le Roux · Andrew Y Ng -
2011 Oral: The Manifold Tangent Classifier »
Salah Rifai · Yann N Dauphin · Pascal Vincent · Yoshua Bengio · Xavier Muller -
2011 Poster: Shallow vs. Deep Sum-Product Networks »
Olivier Delalleau · Yoshua Bengio -
2011 Poster: The Manifold Tangent Classifier »
Salah Rifai · Yann N Dauphin · Pascal Vincent · Yoshua Bengio · Xavier Muller -
2011 Poster: Algorithms for Hyper-Parameter Optimization »
James Bergstra · Rémi Bardenet · Yoshua Bengio · Balázs Kégl -
2011 Poster: On Tracking The Partition Function »
Guillaume Desjardins · Aaron Courville · Yoshua Bengio -
2010 Workshop: Deep Learning and Unsupervised Feature Learning »
Honglak Lee · Marc'Aurelio Ranzato · Yoshua Bengio · Geoffrey E Hinton · Yann LeCun · Andrew Y Ng -
2009 Poster: Slow, Decorrelated Features for Pretraining Complex Cell-like Networks »
James Bergstra · Yoshua Bengio -
2009 Poster: An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism »
Aaron Courville · Douglas Eck · Yoshua Bengio -
2009 Session: Debate on Future Publication Models for the NIPS Community »
Yoshua Bengio -
2007 Poster: Augmented Functional Time Series Representation and Forecasting with Gaussian Processes »
Nicolas Chapados · Yoshua Bengio -
2007 Poster: Learning the 2-D Topology of Images »
Nicolas Le Roux · Yoshua Bengio · Pascal Lamblin · Marc Joliveau · Balázs Kégl -
2007 Spotlight: Augmented Functional Time Series Representation and Forecasting with Gaussian Processes »
Nicolas Chapados · Yoshua Bengio -
2007 Poster: Topmoumoute Online Natural Gradient Algorithm »
Nicolas Le Roux · Pierre-Antoine Manzagol · Yoshua Bengio -
2006 Poster: Greedy Layer-Wise Training of Deep Networks »
Yoshua Bengio · Pascal Lamblin · Dan Popovici · Hugo Larochelle -
2006 Talk: Greedy Layer-Wise Training of Deep Networks »
Yoshua Bengio · Pascal Lamblin · Dan Popovici · Hugo Larochelle