Timezone: »
Modern machine learning requires system designers to specify aspects of the learning pipeline, such as losses, architectures, and optimizers. Meta-learning, or learning-to-learn, instead aims to learn those aspects, and promises to unlock greater capabilities with less manual effort. One particularly ambitious goal of meta-learning is to train general purpose learning algorithms from scratch, using only black box models with minimal inductive bias. A general purpose learning algorithm is one which takes in training data, and produces test-set predictions across a wide range of problems, without any explicit definition of an inference model, training loss, or optimization algorithm. In this paper we show that Transformers and other black-box models can be meta-trained to act as general purpose learning algorithms, and can generalize to learn on different datasets than used during meta-training. We characterize phase transitions between algorithms that generalize, algorithms that memorize, and algorithms that fail to meta-train at all, induced by changes in model size, number of tasks used during meta-training, and meta-optimization hyper-parameters. We further show that the capabilities of meta-trained algorithms are bottlenecked by the accessible state size (memory) determining the next prediction, unlike standard models which are thought to be bottlenecked by parameter count.
Author Information
Louis Kirsch (The Swiss AI Lab IDSIA & Google Brain)
Luke Metz (Google Brain)
James Harrison (Google)
Jascha Sohl-Dickstein (Google)
More from the Same Authors
-
2020 : Training more effective learned optimizers »
Luke Metz -
2021 : Fast Finite Width Neural Tangent Kernel »
Roman Novak · Jascha Sohl-Dickstein · Samuel Schoenholz -
2022 : Expanding the Deployment Envelope of Behavior Prediction via Adaptive Meta-Learning »
Boris Ivanovic · James Harrison · Marco Pavone -
2022 : Meta-Learning General-Purpose Learning Algorithms with Transformers »
Louis Kirsch · Luke Metz · James Harrison · Jascha Sohl-Dickstein -
2022 : The Benefits of Model-Based Generalization in Reinforcement Learning »
Kenny Young · Aditya Ramesh · Louis Kirsch · Jürgen Schmidhuber -
2022 Poster: Exploring through Random Curiosity with General Value Functions »
Aditya Ramesh · Louis Kirsch · Sjoerd van Steenkiste · Jürgen Schmidhuber -
2022 Poster: A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases »
James Harrison · Luke Metz · Jascha Sohl-Dickstein -
2022 Poster: Discovered Policy Optimisation »
Chris Lu · Jakub Kuba · Alistair Letcher · Luke Metz · Christian Schroeder de Witt · Jakob Foerster -
2021 : Luke Metz Q&A »
Luke Metz -
2021 : Luke Metz »
Luke Metz -
2021 Poster: Reverse engineering learned optimizers reveals known and novel mechanisms »
Niru Maheswaranathan · David Sussillo · Luke Metz · Ruoxi Sun · Jascha Sohl-Dickstein -
2020 : Reverse engineering learned optimizers reveals known and novel mechanisms »
Niru Maheswaranathan · David Sussillo · Luke Metz · Ruoxi Sun · Jascha Sohl-Dickstein -
2020 Poster: Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian »
Jack Parker-Holder · Luke Metz · Cinjon Resnick · Hengyuan Hu · Adam Lerer · Alistair Letcher · Alexander Peysakhovich · Aldo Pacchiano · Jakob Foerster -
2020 Poster: Finite Versus Infinite Neural Networks: an Empirical Study »
Jaehoon Lee · Samuel Schoenholz · Jeffrey Pennington · Ben Adlam · Lechao Xiao · Roman Novak · Jascha Sohl-Dickstein -
2020 Spotlight: Finite Versus Infinite Neural Networks: an Empirical Study »
Jaehoon Lee · Samuel Schoenholz · Jeffrey Pennington · Ben Adlam · Lechao Xiao · Roman Novak · Jascha Sohl-Dickstein -
2020 Poster: Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling »
Tong Che · Ruixiang ZHANG · Jascha Sohl-Dickstein · Hugo Larochelle · Liam Paull · Yuan Cao · Yoshua Bengio -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Keun Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2019 : Poster Session »
Jonathan Scarlett · Piotr Indyk · Ali Vakilian · Adrian Weller · Partha P Mitra · Benjamin Aubin · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová · Kristina Monakhova · Joshua Yurtsever · Laura Waller · Hendrik Sommerhoff · Michael Moeller · Rushil Anirudh · Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jayaraman Thiagarajan · Salman Asif · Michael Gillhofer · Johannes Brandstetter · Sepp Hochreiter · Felix Petersen · Dhruv Patel · Assad Oberai · Akshay Kamath · Sushrut Karmalkar · Eric Price · Ali Ahmed · Zahra Kadkhodaie · Sreyas Mohan · Eero Simoncelli · Carlos Fernandez-Granda · Oscar Leong · Wesam Sakla · Rebecca Willett · Stephan Hoyer · Jascha Sohl-Dickstein · Sam Greydanus · Gauri Jagatap · Chinmay Hegde · Michael Kellman · Jonathan Tamir · Nouamane Laanait · Ousmane Dia · Mirco Ravanelli · Jonathan Binas · Negar Rostamzadeh · Shirin Jalali · Tiantian Fang · Alex Schwing · Sébastien Lachapelle · Philippe Brouillard · Tristan Deleu · Simon Lacoste-Julien · Stella Yu · Arya Mazumdar · Ankit Singh Rawat · Yue Zhao · Jianshu Chen · Xiaoyang Li · Hubert Ramsauer · Gabrio Rizzuti · Nikolaos Mitsakos · Dingzhou Cao · Thomas Strohmer · Yang Li · Pei Peng · Gregory Ongie -
2019 : Neural Reparameterization Improves Structural Optimization »
Stephan Hoyer · Jascha Sohl-Dickstein · Sam Greydanus -
2019 Poster: Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent »
Jaehoon Lee · Lechao Xiao · Samuel Schoenholz · Yasaman Bahri · Roman Novak · Jascha Sohl-Dickstein · Jeffrey Pennington -
2019 Poster: Learning to Predict Without Looking Ahead: World Models Without Forward Prediction »
Daniel Freeman · David Ha · Luke Metz -
2019 Poster: Invertible Convolutional Flow »
Mahdi Karami · Dale Schuurmans · Jascha Sohl-Dickstein · Laurent Dinh · Daniel Duckworth -
2019 Spotlight: Invertible Convolutional Flow »
Mahdi Karami · Dale Schuurmans · Jascha Sohl-Dickstein · Laurent Dinh · Daniel Duckworth -
2018 : Learned optimizers that outperform SGD on wall-clock and validation loss »
Luke Metz -
2018 Poster: PCA of high dimensional random walks with comparison to neural network training »
Joseph Antognini · Jascha Sohl-Dickstein -
2018 Poster: Adversarial Examples that Fool both Computer Vision and Time-Limited Humans »
Gamaleldin Elsayed · Shreya Shankar · Brian Cheung · Nicolas Papernot · Alexey Kurakin · Ian Goodfellow · Jascha Sohl-Dickstein -
2017 Poster: REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models »
George Tucker · Andriy Mnih · Chris J Maddison · John Lawson · Jascha Sohl-Dickstein -
2017 Oral: REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models »
George Tucker · Andriy Mnih · Chris J Maddison · John Lawson · Jascha Sohl-Dickstein -
2017 Poster: SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability »
Maithra Raghu · Justin Gilmer · Jason Yosinski · Jascha Sohl-Dickstein -
2016 : From Brains to Bits and Back Again »
Yoshua Bengio · Terrence Sejnowski · Christos H Papadimitriou · Jakob H Macke · Demis Hassabis · Alyson Fletcher · Andreas Tolias · Jascha Sohl-Dickstein · Konrad P Koerding -
2016 : Opening Remarks »
Jascha Sohl-Dickstein -
2016 Workshop: Brains and Bits: Neuroscience meets Machine Learning »
Alyson Fletcher · Eva Dyer · Jascha Sohl-Dickstein · Joshua T Vogelstein · Konrad Koerding · Jakob H Macke -
2016 Poster: Exponential expressivity in deep neural networks through transient chaos »
Ben Poole · Subhaneil Lahiri · Maithra Raghu · Jascha Sohl-Dickstein · Surya Ganguli -
2015 Workshop: Statistical Methods for Understanding Neural Systems »
Alyson Fletcher · Jakob H Macke · Ryan Adams · Jascha Sohl-Dickstein -
2015 Poster: Deep Knowledge Tracing »
Chris Piech · Jonathan Bassen · Jonathan Huang · Surya Ganguli · Mehran Sahami · Leonidas Guibas · Jascha Sohl-Dickstein -
2012 Poster: Training sparse natural image models with a fast Gibbs sampler of an extended state space »
Lucas Theis · Jascha Sohl-Dickstein · Matthias Bethge