Timezone: »
We propose a novel reinforcement learning algorithm, AlphaNPI, that incorpo- rates the strengths of Neural Programmer-Interpreters (NPI) and AlphaZero. NPI contributes structural biases in the form of modularity, hierarchy and recursion, which are helpful to reduce sample complexity, improve generalization and in- crease interpretability. AlphaZero contributes powerful neural network guided search algorithms, which we augment with recursion. AlphaNPI only assumes a hierarchical program specification with sparse rewards: 1 when the program execution satisfies the specification, and 0 otherwise. This specification enables us to overcome the need for strong supervision in the form of execution traces and consequently train NPI models effectively with reinforcement learning. The experiments show that AlphaNPI can sort as well as previous strongly supervised NPI variants. The AlphaNPI agent is also trained on a Tower of Hanoi puzzle with two disks and is shown to generalize to puzzles with an arbitrary number of disks. The experiments also show that when deploying our neural network policies, it is advantageous to do planning with guided Monte Carlo tree search.
Author Information
Thomas PIERROT (InstaDeep)
PhD candidate at InstaDeep and Paris Sorbonne University.
Guillaume Ligner (InstaDeep)
Scott Reed (Google DeepMind)
Olivier Sigaud (Sorbonne University)
Nicolas Perrin (CNRS, Sorbonne Université)
Alexandre Laterre (InstaDeep)
David Kas (Entrepreneur First)
Tech Entrepreneur / AI Research and Engineering
Karim Beguir (InstaDeep)
Nando de Freitas (DeepMind)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Learning Compositional Neural Programs with Recursive Tree Search and Planning »
Wed. Dec 11th 01:30 -- 03:30 AM Room East Exhibition Hall B + C #207
More from the Same Authors
-
2021 : One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient Reinforcement Learning »
Clément Bonnet · Paul Caron · Thomas D Barrett · Ian Davies · Alexandre Laterre -
2022 : So ManyFolds, So Little Time: Efficient Protein Structure Prediction with pLMs and MSAs »
Thomas D Barrett · Amelia Villegas-Morcillo · Louis Robinson · Benoit Gaujac · David Admète · Elia Saquand · Karim Beguir · Arthur Flajolet -
2022 : So ManyFolds, So Little Time: Efficient Protein Structure Prediction With pLMs and MSAs »
Thomas D Barrett · Amelia Villegas-Morcillo · Louis Robinson · Benoit Gaujac · Karim Beguir · Arthur Flajolet -
2022 : Peptide-MHC Structure Prediction With Mixed Residue and Atom Graph Neural Network »
Antoine Delaunay · Yunguan Fu · Alberto Bégué · Robert McHardy · Bachir Djermani · Liviu Copoiu · Michael Rooney · Andrey Tovchigrechko · Marcin Skwark · Nicolas Lopez Carranza · Maren Lang · Karim Beguir · Ugur Sahin -
2022 : Multi-step Planning for Automated Hyperparameter Optimization with OptFormer »
Lucio M Dery · Abram Friesen · Nando de Freitas · Marc'Aurelio Ranzato · Yutian Chen -
2022 : Overcoming Referential Ambiguity in language-guided goal-conditioned Reinforcement Learning »
Hugo Caselles-Dupré · Olivier Sigaud · Mohamed CHETOUANI -
2022 : Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer Value Function »
Clément Bonnet · Laurence Midgley · Alexandre Laterre -
2022 Poster: EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL »
Thomas Carta · Pierre-Yves Oudeyer · Olivier Sigaud · Sylvain Lamprier -
2022 Poster: Pragmatically Learning from Pedagogical Demonstrations in Multi-Goal Environments »
Hugo Caselles-Dupré · Olivier Sigaud · Mohamed CHETOUANI -
2022 Poster: Towards Learning Universal Hyperparameter Optimizers with Transformers »
Yutian Chen · Xingyou Song · Chansoo Lee · Zi Wang · Richard Zhang · David Dohan · Kazuya Kawakami · Greg Kochanski · Arnaud Doucet · Marc'Aurelio Ranzato · Sagi Perel · Nando de Freitas -
2021 : Retrospective Panel »
Sergey Levine · Nando de Freitas · Emma Brunskill · Finale Doshi-Velez · Nan Jiang · Rishabh Agarwal -
2020 : Designing a Prospective COVID-19 Therapeutic with Reinforcement Learning »
Nicolas Lopez Carranza · Thomas PIERROT · Joe Phillips · Alexandre Laterre · Amine Kerkeni · Karim Beguir -
2020 : Panel »
Emma Brunskill · Nan Jiang · Nando de Freitas · Finale Doshi-Velez · Sergey Levine · John Langford · Lihong Li · George Tucker · Rishabh Agarwal · Aviral Kumar -
2020 : Offline RL »
Nando de Freitas -
2020 : Thomas Pierrot - Learning Compositional Neural Programs for Continuous Control »
Thomas PIERROT -
2020 Poster: A game-theoretic analysis of networked system control for common-pool resource management using multi-agent reinforcement learning »
Arnu Pretorius · Scott Cameron · Elan van Biljon · Thomas Makkink · Shahil Mawjee · Jeremy du Plessis · Jonathan Shock · Alexandre Laterre · Karim Beguir -
2020 Poster: Critic Regularized Regression »
Ziyu Wang · Alexander Novikov · Konrad Zolna · Josh Merel · Jost Tobias Springenberg · Scott Reed · Bobak Shahriari · Noah Siegel · Caglar Gulcehre · Nicolas Heess · Nando de Freitas -
2020 Poster: Modular Meta-Learning with Shrinkage »
Yutian Chen · Abram Friesen · Feryal Behbahani · Arnaud Doucet · David Budden · Matthew Hoffman · Nando de Freitas -
2020 Spotlight: Modular Meta-Learning with Shrinkage »
Yutian Chen · Abram Friesen · Feryal Behbahani · Arnaud Doucet · David Budden · Matthew Hoffman · Nando de Freitas -
2020 Poster: RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning »
Caglar Gulcehre · Ziyu Wang · Alexander Novikov · Thomas Paine · Sergio Gómez · Konrad Zolna · Rishabh Agarwal · Josh Merel · Daniel Mankowitz · Cosmin Paduraru · Gabriel Dulac-Arnold · Jerry Li · Mohammad Norouzi · Matthew Hoffman · Nicolas Heess · Nando de Freitas -
2019 Workshop: Science meets Engineering of Deep Learning »
Levent Sagun · Caglar Gulcehre · Adriana Romero Soriano · Negar Rostamzadeh · Nando de Freitas -
2019 : Welcoming remarks and introduction »
Levent Sagun · Caglar Gulcehre · Adriana Romero Soriano · Negar Rostamzadeh · Nando de Freitas -
2018 : TBA 5 »
Nando de Freitas -
2018 : Invited Talk 5: Nando de Freitas »
Nando de Freitas -
2018 : Poster Session 1 + Coffee »
Tom Van de Wiele · Rui Zhao · J. Fernando Hernandez-Garcia · Fabio Pardo · Xian Yeow Lee · Xiaolin Andy Li · Marcin Andrychowicz · Jie Tang · Suraj Nair · Juhyeon Lee · Cédric Colas · S. M. Ali Eslami · Yen-Chen Wu · Stephen McAleer · Ryan Julian · Yang Xue · Matthia Sabatelli · Pranav Shyam · Alexandros Kalousis · Giovanni Montana · Emanuele Pesce · Felix Leibfried · Zhanpeng He · Chunxiao Liu · Yanjun Li · Yoshihide Sawada · Alexander Pashevich · Tejas Kulkarni · Keiran Paster · Luca Rigazio · Quan Vuong · Hyunggon Park · Minhae Kwon · Rivindu Weerasekera · Shamane Siriwardhanaa · Rui Wang · Ozsel Kilinc · Keith Ross · Yizhou Wang · Simon Schmitt · Thomas Anthony · Evan Cater · Forest Agostinelli · Tegg Sung · Shirou Maruyama · Alexander Shmakov · Devin Schwab · Mohammad Firouzi · Glen Berseth · Denis Osipychev · Jesse Farebrother · Jianlan Luo · William Agnew · Peter Vrancx · Jonathan Heek · Catalin Ionescu · Haiyan Yin · Megumi Miyashita · Nathan Jay · Noga H. Rotman · Sam Leroux · Shaileshh Bojja Venkatakrishnan · Henri Schmidt · Jack Terwilliger · Ishan Durugkar · Jonathan Sauder · David Kas · Arash Tavakoli · Alain-Sam Cohen · Philip Bontrager · Adam Lerer · Thomas Paine · Ahmed Khalifa · Ruben Rodriguez · Avi Singh · Yiming Zhang -
2018 Poster: Playing hard exploration games by watching YouTube »
Yusuf Aytar · Tobias Pfaff · David Budden · Thomas Paine · Ziyu Wang · Nando de Freitas -
2018 Spotlight: Playing hard exploration games by watching YouTube »
Yusuf Aytar · Tobias Pfaff · David Budden · Thomas Paine · Ziyu Wang · Nando de Freitas -
2018 Poster: Neural Arithmetic Logic Units »
Andrew Trask · Felix Hill · Scott Reed · Jack Rae · Chris Dyer · Phil Blunsom -
2017 Poster: Robust Imitation of Diverse Behaviors »
Ziyu Wang · Josh Merel · Scott Reed · Nando de Freitas · Gregory Wayne · Nicolas Heess -
2017 Tutorial: Deep Learning: Practice and Trends »
Nando de Freitas · Scott Reed · Oriol Vinyals -
2016 Workshop: Neural Abstract Machines & Program Induction »
Matko Bošnjak · Nando de Freitas · Tejas Kulkarni · Arvind Neelakantan · Scott E Reed · Sebastian Riedel · Tim Rocktäschel -
2016 : Nando De Freitas »
Nando de Freitas -
2016 : Learning To Optimize »
Nando de Freitas -
2016 Poster: Learning to learn by gradient descent by gradient descent »
Marcin Andrychowicz · Misha Denil · Sergio Gómez · Matthew Hoffman · David Pfau · Tom Schaul · Nando de Freitas -
2015 Workshop: Bayesian Optimization: Scalability and Flexibility »
Bobak Shahriari · Ryan Adams · Nando de Freitas · Amar Shah · Roberto Calandra