Timezone: »
We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate. MAPO is applicable to deterministic environments with discrete actions, such as structured prediction and combinatorial optimization tasks. We express the expected return objective as a weighted sum of two terms: an expectation over the high-reward trajectories inside the memory buffer, and a separate expectation over trajectories outside the buffer. To make an efficient algorithm of MAPO, we propose: (1) memory weight clipping to accelerate and stabilize training; (2) systematic exploration to discover high-reward trajectories; (3) distributed sampling from inside and outside of the memory buffer to scale up training. MAPO improves the sample efficiency and robustness of policy gradient, especially on tasks with sparse rewards. We evaluate MAPO on weakly supervised program synthesis from natural language (semantic parsing). On the WikiTableQuestions benchmark, we improve the state-of-the-art by 2.6%, achieving an accuracy of 46.3%. On the WikiSQL benchmark, MAPO achieves an accuracy of 74.9% with only weak supervision, outperforming several strong baselines with full supervision. Our source code is available at https://goo.gl/TXBp4e
Author Information
Chen Liang (Google Brain)
Mohammad Norouzi (Google Brain)
Jonathan Berant (Tel Aviv University)
Quoc V Le (Google)
Ni Lao (Mosaix.ai)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing »
Wed. Dec 5th through Thu the 6th Room Room 517 AB
More from the Same Authors
-
2021 : CommonsenseQA 2.0: Exposing the Limits of AI through Gamification »
Alon Talmor · Ori Yoran · Ronan Le Bras · Chandra Bhagavatula · Yoav Goldberg · Yejin Choi · Jonathan Berant -
2021 : Palette: Image-to-Image Diffusion Models »
Chitwan Saharia · William Chan · Huiwen Chang · Chris Lee · Jonathan Ho · Tim Salimans · David Fleet · Mohammad Norouzi -
2021 : Palette: Image-to-Image Diffusion Models »
Chitwan Saharia · William Chan · Huiwen Chang · Chris Lee · Jonathan Ho · Tim Salimans · David Fleet · Mohammad Norouzi -
2021 : NLP with Synthetic Text »
Mohammad Norouzi -
2021 Poster: Why Do Better Loss Functions Lead to Less Transferable Features? »
Simon Kornblith · Ting Chen · Honglak Lee · Mohammad Norouzi -
2021 Poster: CoAtNet: Marrying Convolution and Attention for All Data Sizes »
Zihang Dai · Hanxiao Liu · Quoc V Le · Mingxing Tan -
2021 Poster: Searching for Efficient Transformers for Language Modeling »
David So · Wojciech Mańke · Hanxiao Liu · Zihang Dai · Noam Shazeer · Quoc V Le -
2021 Poster: Pay Attention to MLPs »
Hanxiao Liu · Zihang Dai · David So · Quoc V Le -
2021 : CommonsenseQA 2.0: Exposing the Limits of AI through Gamification »
Alon Talmor · Ori Yoran · Ronan Le Bras · Chandra Bhagavatula · Yoav Goldberg · Yejin Choi · Jonathan Berant -
2020 : Panel Discussion & Closing »
Yejin Choi · Alexei Efros · Chelsea Finn · Kristen Grauman · Quoc V Le · Yann LeCun · Ruslan Salakhutdinov · Eric Xing -
2020 Poster: Evolving Normalization-Activation Layers »
Hanxiao Liu · Andy Brock · Karen Simonyan · Quoc V Le -
2020 Spotlight: Evolving Normalization-Activation Layers »
Hanxiao Liu · Andy Brock · Karen Simonyan · Quoc V Le -
2020 Poster: Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards »
Yijie Guo · Jongwook Choi · Marcin Moczulski · Shengyu Feng · Samy Bengio · Mohammad Norouzi · Honglak Lee -
2020 Poster: PyGlove: Symbolic Programming for Automated Machine Learning »
Daiyi Peng · Xuanyi Dong · Esteban Real · Mingxing Tan · Yifeng Lu · Gabriel Bender · Hanxiao Liu · Adam Kraft · Chen Liang · Quoc V Le -
2020 Poster: RandAugment: Practical Automated Data Augmentation with a Reduced Search Space »
Ekin Dogus Cubuk · Barret Zoph · Jonathon Shlens · Quoc V Le -
2020 Oral: PyGlove: Symbolic Programming for Automated Machine Learning »
Daiyi Peng · Xuanyi Dong · Esteban Real · Mingxing Tan · Yifeng Lu · Gabriel Bender · Hanxiao Liu · Adam Kraft · Chen Liang · Quoc V Le -
2020 Poster: Exemplar VAE: Linking Generative Models, Nearest Neighbor Retrieval, and Data Augmentation »
Sajad Norouzi · David Fleet · Mohammad Norouzi -
2020 Poster: Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge »
Alon Talmor · Oyvind Tafjord · Peter Clark · Yoav Goldberg · Jonathan Berant -
2020 Poster: RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning »
Caglar Gulcehre · Ziyu Wang · Alexander Novikov · Thomas Paine · Sergio Gómez · Konrad Zolna · Rishabh Agarwal · Josh Merel · Daniel Mankowitz · Cosmin Paduraru · Gabriel Dulac-Arnold · Jerry Li · Mohammad Norouzi · Matthew Hoffman · Nicolas Heess · Nando de Freitas -
2020 Poster: Big Self-Supervised Models are Strong Semi-Supervised Learners »
Ting Chen · Simon Kornblith · Kevin Swersky · Mohammad Norouzi · Geoffrey E Hinton -
2020 Spotlight: Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge »
Alon Talmor · Oyvind Tafjord · Peter Clark · Yoav Goldberg · Jonathan Berant -
2020 Poster: Rethinking Pre-training and Self-training »
Barret Zoph · Golnaz Ghiasi · Tsung-Yi Lin · Yin Cui · Hanxiao Liu · Ekin Dogus Cubuk · Quoc V Le -
2020 Oral: Rethinking Pre-training and Self-training »
Barret Zoph · Golnaz Ghiasi · Tsung-Yi Lin · Yin Cui · Hanxiao Liu · Ekin Dogus Cubuk · Quoc V Le -
2020 : Policy Panel »
Roya Pakzad · Dia Kayyali · Marzyeh Ghassemi · Shakir Mohamed · Mohammad Norouzi · Ted Pedersen · Anver Emon · Abubakar Abid · Darren Byler · Samhaa R. El-Beltagy · Nayel Shafei · Mona Diab -
2020 Affinity Workshop: Muslims in ML »
Marzyeh Ghassemi · Mohammad Norouzi · Shakir Mohamed · Aya Salama · Tasmie Sarker -
2020 Poster: Unsupervised Data Augmentation for Consistency Training »
Qizhe Xie · Zihang Dai · Eduard Hovy · Thang Luong · Quoc V Le -
2020 Poster: Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing »
Zihang Dai · Guokun Lai · Yiming Yang · Quoc V Le -
2019 Poster: XLNet: Generalized Autoregressive Pretraining for Language Understanding »
Zhilin Yang · Zihang Dai · Yiming Yang · Jaime Carbonell · Russ Salakhutdinov · Quoc V Le -
2019 Oral: XLNet: Generalized Autoregressive Pretraining for Language Understanding »
Zhilin Yang · Zihang Dai · Yiming Yang · Jaime Carbonell · Russ Salakhutdinov · Quoc V Le -
2019 Poster: CondConv: Conditionally Parameterized Convolutions for Efficient Inference »
Brandon Yang · Gabriel Bender · Quoc V Le · Jiquan Ngiam -
2019 Poster: Mixtape: Breaking the Softmax Bottleneck Efficiently »
Zhilin Yang · Thang Luong · Russ Salakhutdinov · Quoc V Le -
2019 Poster: Saccader: Improving Accuracy of Hard Attention Models for Vision »
Gamaleldin Elsayed · Simon Kornblith · Quoc V Le -
2019 Poster: Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse »
James Lucas · George Tucker · Roger Grosse · Mohammad Norouzi -
2019 Poster: GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism »
Yanping Huang · Youlong Cheng · Ankur Bapna · Orhan Firat · Dehao Chen · Mia Chen · HyoukJoong Lee · Jiquan Ngiam · Quoc V Le · Yonghui Wu · zhifeng Chen -
2019 Poster: High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks »
Ruben Villegas · Arkanath Pathak · Harini Kannan · Dumitru Erhan · Quoc V Le · Honglak Lee -
2018 Poster: Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning »
Supasorn Suwajanakorn · Noah Snavely · Jonathan Tompson · Mohammad Norouzi -
2018 Oral: Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning »
Supasorn Suwajanakorn · Noah Snavely · Jonathan Tompson · Mohammad Norouzi -
2018 Poster: Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction »
Roei Herzig · Moshiko Raboh · Gal Chechik · Jonathan Berant · Amir Globerson -
2018 Poster: DropBlock: A regularization method for convolutional networks »
Golnaz Ghiasi · Tsung-Yi Lin · Quoc V Le -
2017 Symposium: Metalearning »
Risto Miikkulainen · Quoc V Le · Kenneth Stanley · Chrisantha Fernando -
2017 Poster: Bridging the Gap Between Value and Policy Based Reinforcement Learning »
Ofir Nachum · Mohammad Norouzi · Kelvin Xu · Dale Schuurmans -
2017 Poster: Filtering Variational Objectives »
Chris Maddison · John Lawson · George Tucker · Nicolas Heess · Mohammad Norouzi · Andriy Mnih · Arnaud Doucet · Yee Teh -
2016 Poster: An Online Sequence-to-Sequence Model Using Partial Conditioning »
Navdeep Jaitly · Quoc V Le · Oriol Vinyals · Ilya Sutskever · David Sussillo · Samy Bengio -
2016 Poster: Reward Augmented Maximum Likelihood for Neural Structured Prediction »
Mohammad Norouzi · Samy Bengio · zhifeng Chen · Navdeep Jaitly · Mike Schuster · Yonghui Wu · Dale Schuurmans -
2015 Poster: Semi-supervised Sequence Learning »
Andrew Dai · Quoc V Le -
2015 Poster: Efficient Non-greedy Optimization of Decision Trees »
Mohammad Norouzi · Maxwell Collins · Matthew A Johnson · David Fleet · Pushmeet Kohli -
2014 Poster: Sequence to Sequence Learning with Neural Networks »
Ilya Sutskever · Oriol Vinyals · Quoc V Le -
2014 Oral: Sequence to Sequence Learning with Neural Networks »
Ilya Sutskever · Oriol Vinyals · Quoc V Le -
2013 Workshop: Randomized Methods for Machine Learning »
David Lopez-Paz · Quoc V Le · Alexander Smola -
2012 Poster: Hamming Distance Metric Learning »
Mohammad Norouzi · Russ Salakhutdinov · David Fleet