Timezone: »
Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past. The most common method for training recurrent neural networks, back-propagation through time (BPTT), requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes computationally expensive or even infeasible when used with long sequences. Importantly, biological brains are unlikely to perform such detailed reverse replay over very long sequences of internal states (consider days, months, or years.) However, humans are often reminded of past memories or mental states which are associated with the current mental state. We consider the hypothesis that such memory associations between past and present could be used for credit assignment through arbitrarily long sequences, propagating the credit assigned to the current state to the associated past state. Based on this principle, we study a novel algorithm which only back-propagates through a few of these temporal skip connections, realized by a learned attention mechanism that associates current states with relevant past states. We demonstrate in experiments that our method matches or outperforms regular BPTT and truncated BPTT in tasks involving particularly long-term dependencies, but without requiring the biologically implausible backward replay through the whole history of states. Additionally, we demonstrate that the proposed method transfers to longer sequences significantly better than LSTMs trained with BPTT and LSTMs trained with full self-attention.
Author Information
Nan Rosemary Ke (MILA, University of Montreal)
Anirudh Goyal (Université de Montréal)
Olexa Bilaniuk (University of Montreal)
Jonathan Binas (MILA, Montreal)
Michael Mozer (Google Brain / U. Colorado)
Chris Pal (MILA, Polytechnique Montréal, Element AI)
Yoshua Bengio (U. Montreal)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding »
Tue. Dec 4th through Wed the 5th Room Room 210 #23
More from the Same Authors
-
2021 : Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning »
Nan Rosemary Ke · Aniket Didolkar · Sarthak Mittal · Anirudh Goyal · Guillaume Lajoie · Stefan Bauer · Danilo Jimenez Rezende · Yoshua Bengio · Chris Pal · Michael Mozer -
2021 : Prequential MDL for Causal Structure Learning with Neural Networks »
Jorg Bornschein · Silvia Chiappa · Alan Malek · Nan Rosemary Ke -
2021 : Learning Neural Causal Models with Active Interventions »
Nino Scherrer · Olexa Bilaniuk · Yashas Annadani · Anirudh Goyal · Patrick Schwab · Bernhard Schölkopf · Michael Mozer · Yoshua Bengio · Stefan Bauer · Nan Rosemary Ke -
2022 Poster: Discrete Compositional Representations as an Abstraction for Goal Conditioned Reinforcement Learning »
Riashat Islam · Hongyu Zang · Anirudh Goyal · Alex Lamb · Kenji Kawaguchi · Xin Li · Romain Laroche · Yoshua Bengio · Remi Tachet des Combes -
2022 : Test-time adaptation with slot-centric models »
Mihir Prabhudesai · Sujoy Paul · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Anirudh Goyal · Deepak Pathak · Katerina Fragkiadaki · Gaurav Aggarwal · Thomas Kipf -
2022 : Test-time adaptation with slot-centric models »
Mihir Prabhudesai · Sujoy Paul · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Anirudh Goyal · Deepak Pathak · Katerina Fragkiadaki · Gaurav Aggarwal · Thomas Kipf -
2022 : Learning Neural Causal Models »
Nan Rosemary Ke -
2022 Poster: Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning »
Aniket Didolkar · Kshitij Gupta · Anirudh Goyal · Nitesh Bharadwaj Gundavarapu · Alex Lamb · Nan Rosemary Ke · Yoshua Bengio -
2021 : Rosemary Ke - From "What" to "Why": towards causal learning »
Nan Rosemary Ke -
2021 : Nan Rosemary Ke Q&A »
Nan Rosemary Ke -
2021 : Nan Rosemary Ke »
Nan Rosemary Ke -
2021 : Real Robot Challenge II + Q&A »
Stefan Bauer · Joel Akpo · Manuel Wuethrich · Nan Rosemary Ke · Anirudh Goyal · Thomas Steinbrenner · Felix Widmaier · Annika Buchholz · Bernhard Schölkopf · Dieter Büchler · Ludovic Righetti · Franziska Meier -
2021 Poster: Neural Production Systems »
Anirudh Goyal · Aniket Didolkar · Nan Rosemary Ke · Charles Blundell · Philippe Beaudoin · Nicolas Heess · Michael Mozer · Yoshua Bengio -
2021 Poster: Discrete-Valued Neural Communication »
Dianbo Liu · Alex Lamb · Kenji Kawaguchi · Anirudh Goyal · Chen Sun · Michael Mozer · Yoshua Bengio -
2020 Poster: Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples »
Samarth Sinha · Zhengli Zhao · Anirudh Goyal · Colin A Raffel · Augustus Odena -
2020 Poster: Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning »
Julien Roy · Paul Barde · Félix Harvey · Derek Nowrouzezahrai · Chris Pal -
2020 Poster: Untangling tradeoffs between recurrence and self-attention in artificial neural networks »
Giancarlo Kerg · Bhargav Kanuparthi · Anirudh Goyal · Kyle Goyette · Yoshua Bengio · Guillaume Lajoie -
2020 Poster: Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization »
Paul Barde · Julien Roy · Wonseok Jeon · Joelle Pineau · Chris Pal · Derek Nowrouzezahrai -
2020 Spotlight: Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization »
Paul Barde · Julien Roy · Wonseok Jeon · Joelle Pineau · Chris Pal · Derek Nowrouzezahrai -
2019 : Climate Change: A Grand Challenge for ML »
Yoshua Bengio · Carla Gomes · Andrew Ng · Jeff Dean · Lester Mackey -
2019 : Coffee break, posters, and 1-on-1 discussions »
Yangyi Lu · Daniel Chen · Hongseok Namkoong · Marie Charpignon · Maja Rudolph · Amanda Coston · Julius von Kügelgen · Niranjani Prasad · Paramveer Dhillon · Yunzong Xu · Yixin Wang · Alexander Markham · David Rohde · Rahul Singh · Zichen Zhang · Negar Hassanpour · Ankit Sharma · Ciarán Lee · Jean Pouget-Abadie · Jesse Krijthe · Divyat Mahajan · Nan Rosemary Ke · Peter Wirnsberger · Vira Semenova · Dmytro Mykhaylov · Dennis Shen · Kenta Takatsu · Liyang Sun · Jeremy Yang · Alexander Franks · Pak Kan Wong · Tauhid Zaman · Shira Mitchell · min kyoung kang · Qi Yang -
2019 : Poster Session »
Jonathan Scarlett · Piotr Indyk · Ali Vakilian · Adrian Weller · Partha P Mitra · Benjamin Aubin · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová · Kristina Monakhova · Joshua Yurtsever · Laura Waller · Hendrik Sommerhoff · Michael Moeller · Rushil Anirudh · Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jayaraman Thiagarajan · Salman Asif · Michael Gillhofer · Johannes Brandstetter · Sepp Hochreiter · Felix Petersen · Dhruv Patel · Assad Oberai · Akshay Kamath · Sushrut Karmalkar · Eric Price · Ali Ahmed · Zahra Kadkhodaie · Sreyas Mohan · Eero Simoncelli · Carlos Fernandez-Granda · Oscar Leong · Wesam Sakla · Rebecca Willett · Stephan Hoyer · Jascha Sohl-Dickstein · Sam Greydanus · Gauri Jagatap · Chinmay Hegde · Michael Kellman · Jonathan Tamir · Nouamane Laanait · Ousmane Dia · Mirco Ravanelli · Jonathan Binas · Negar Rostamzadeh · Shirin Jalali · Tiantian Fang · Alex Schwing · Sébastien Lachapelle · Philippe Brouillard · Tristan Deleu · Simon Lacoste-Julien · Stella Yu · Arya Mazumdar · Ankit Singh Rawat · Yue Zhao · Jianshu Chen · Xiaoyang Li · Hubert Ramsauer · Gabrio Rizzuti · Nikolaos Mitsakos · Dingzhou Cao · Thomas Strohmer · Yang Li · Pei Peng · Gregory Ongie -
2019 Poster: How to Initialize your Network? Robust Initialization for WeightNorm & ResNets »
Devansh Arpit · Víctor Campos · Yoshua Bengio -
2019 Poster: Variational Temporal Abstraction »
Taesup Kim · Sungjin Ahn · Yoshua Bengio -
2019 Poster: Neural Multisensory Scene Inference »
Jae Hyun Lim · Pedro O. Pinheiro · Negar Rostamzadeh · Chris Pal · Sungjin Ahn -
2019 Poster: On Adversarial Mixup Resynthesis »
Christopher Beckham · Sina Honari · Alex Lamb · Vikas Verma · Farnoosh Ghadiri · R Devon Hjelm · Yoshua Bengio · Chris Pal -
2018 : Opening remarks »
Yoshua Bengio -
2018 Poster: Learning Deep Disentangled Embeddings With the F-Statistic Loss »
Karl Ridgeway · Michael Mozer -
2018 Poster: Image-to-image translation for cross-domain disentanglement »
Abel Gonzalez-Garcia · Joost van de Weijer · Yoshua Bengio -
2018 Poster: Towards Deep Conversational Recommendations »
Raymond Li · Samira Ebrahimi Kahou · Hannes Schulz · Vincent Michalski · Laurent Charlin · Chris Pal -
2018 Poster: MetaGAN: An Adversarial Approach to Few-Shot Learning »
Ruixiang ZHANG · Tong Che · Zoubin Ghahramani · Yoshua Bengio · Yangqiu Song -
2018 Poster: Unsupervised Depth Estimation, 3D Face Rotation and Replacement »
Joel Ruben Antony Moniz · Christopher Beckham · Simon Rajotte · Sina Honari · Chris Pal -
2018 Poster: Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning »
Tyler Scott · Karl Ridgeway · Michael Mozer -
2018 Poster: Bayesian Model-Agnostic Meta-Learning »
Jaesik Yoon · Taesup Kim · Ousmane Dia · Sungwoong Kim · Yoshua Bengio · Sungjin Ahn -
2018 Spotlight: Bayesian Model-Agnostic Meta-Learning »
Jaesik Yoon · Taesup Kim · Ousmane Dia · Sungwoong Kim · Yoshua Bengio · Sungjin Ahn -
2018 Spotlight: Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning »
Tyler Scott · Karl Ridgeway · Michael Mozer -
2018 Poster: Towards Text Generation with Adversarially Learned Neural Outlines »
Sandeep Subramanian · Sai Rajeswar Mudumba · Alessandro Sordoni · Adam Trischler · Aaron Courville · Chris Pal -
2018 Poster: Dendritic cortical microcircuits approximate the backpropagation algorithm »
João Sacramento · Rui Ponte Costa · Yoshua Bengio · Walter Senn -
2018 Oral: Dendritic cortical microcircuits approximate the backpropagation algorithm »
João Sacramento · Rui Ponte Costa · Yoshua Bengio · Walter Senn -
2017 : Yoshua Bengio »
Yoshua Bengio -
2017 : Access consciousness and the construction of actionable representations »
Michael C Mozer -
2017 : More Steps towards Biologically Plausible Backprop »
Yoshua Bengio -
2017 : Workshop overview »
Michael Mozer · Angela Yu · Brenden Lake -
2017 Workshop: Cognitively Informed Artificial Intelligence: Insights From Natural Intelligence »
Michael Mozer · Brenden Lake · Angela Yu -
2017 : A3T: Adversarially Augmented Adversarial Training »
Aristide Baratin · Simon Lacoste-Julien · Yoshua Bengio · Akram Erraqabi -
2017 : Competition III: The Conversational Intelligence Challenge »
Mikhail Burtsev · Ryan Lowe · Iulian Vlad Serban · Yoshua Bengio · Alexander Rudnicky · Alan W Black · Shrimai Prabhumoye · Artem Rodichev · Nikita Smetanin · Denis Fedorenko · CheongAn Lee · EUNMI HONG · Hwaran Lee · Geonmin Kim · Nicolas Gontier · Atsushi Saito · Andrey Gershfeld · Artem Burachenok -
2017 Poster: ExtremeWeather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events »
Evan Racah · Christopher Beckham · Tegan Maharaj · Samira Ebrahimi Kahou · Mr. Prabhat · Chris Pal -
2017 Poster: Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net »
Anirudh Goyal · Nan Rosemary Ke · Surya Ganguli · Yoshua Bengio -
2017 Demonstration: A Deep Reinforcement Learning Chatbot »
Iulian Vlad Serban · Chinnadhurai Sankar · Mathieu Germain · Saizheng Zhang · Zhouhan Lin · Sandeep Subramanian · Taesup Kim · Michael Pieper · Sarath Chandar · Nan Rosemary Ke · Sai Rajeswar Mudumba · Alexandre de Brébisson · Jose Sotelo · Dendi A Suhubdy · Vincent Michalski · Joelle Pineau · Yoshua Bengio -
2017 Poster: GibbsNet: Iterative Adversarial Inference for Deep Graphical Models »
Alex Lamb · R Devon Hjelm · Yaroslav Ganin · Joseph Paul Cohen · Aaron Courville · Yoshua Bengio -
2017 Poster: Plan, Attend, Generate: Planning for Sequence-to-Sequence Models »
Caglar Gulcehre · Francis Dutil · Adam Trischler · Yoshua Bengio -
2017 Poster: Z-Forcing: Training Stochastic Recurrent Networks »
Anirudh Goyal · Alessandro Sordoni · Marc-Alexandre Côté · Nan Rosemary Ke · Yoshua Bengio -
2016 : Deep counter networks for asynchronous event-based processing »
Jonathan Binas -
2016 : Overcoming temptation: Incentive design for intertemporal choice »
Michael Mozer -
2016 : Opening Remarks, Invited Talk: Michael C. Mozer »
Michael Mozer -
2016 Poster: Professor Forcing: A New Algorithm for Training Recurrent Networks »
Alex M Lamb · Anirudh Goyal · Ying Zhang · Saizheng Zhang · Aaron Courville · Yoshua Bengio -
2014 Workshop: Human Propelled Machine Learning »
Richard Baraniuk · Michael Mozer · Divyanshu Vats · Christoph Studer · Andrew E Waters · Andrew Lan -
2014 Poster: Automatic Discovery of Cognitive Skills to Improve the Prediction of Student Learning »
Robert Lindsey · Mohammad Khajah · Michael Mozer -
2013 Poster: Optimizing Instructional Policies »
Robert Lindsey · Michael Mozer · William J Huggins · Harold Pashler -
2013 Oral: Optimizing Instructional Policies »
Robert Lindsey · Michael Mozer · William J Huggins · Harold Pashler -
2012 Workshop: Personalizing education with machine learning »
Michael Mozer · javier r movellan · Robert Lindsey · Jacob Whitehill -
2011 Poster: An Unsupervised Decontamination Procedure For Improving The Reliability Of Human Judgments »
Michael Mozer · Benjamin Link · Harold Pashler -
2010 Spotlight: Improving Human Judgments by Decontaminating Sequential Dependencies »
Michael Mozer · Harold Pashler · Matthew Wilder · Robert Lindsey · Matt Jones · Michael Jones -
2010 Poster: Improving Human Judgments by Decontaminating Sequential Dependencies »
Michael Mozer · Harold Pashler · Matthew Wilder · Robert Lindsey · Matt Jones · Michael Jones -
2009 Poster: Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory »
Michael Mozer · Harold Pashler · Nicholas Cepeda · Robert Lindsey · Edward Vul -
2009 Spotlight: Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory »
Michael Mozer · Harold Pashler · Nicholas Cepeda · Robert Lindsey · Edward Vul -
2009 Poster: Sequential effects reflect parallel learning of multiple environmental regularities »
Matthew Wilder · Matt Jones · Michael Mozer -
2008 Poster: Optimal Response Initiation: Why Recent Experience Matters »
Matt Jones · Michael Mozer · Sachiko Kinoshita -
2008 Spotlight: Optimal Response Initiation: Why Recent Experience Matters »
Matt Jones · Michael Mozer · Sachiko Kinoshita -
2008 Poster: Temporal Dynamics of Cognitive Control »
Jeremy Reynolds · Michael Mozer -
2007 Spotlight: Experience-Guided Search: A Theory of Attentional Control »
Michael Mozer · David Baldwin -
2007 Poster: Experience-Guided Search: A Theory of Attentional Control »
Michael Mozer · David Baldwin -
2006 Poster: Context Effects in Category Learning: An Investigation of Four Probabilistic Models »
Michael Mozer · Michael Jones · Michael Shettel