Timezone: »
In online reinforcement learning (RL), efficient exploration remains particularly challenging in high-dimensional environments with sparse rewards. In low-dimensional environments, where tabular parameterization is possible, count-based upper confidence bound (UCB) exploration methods achieve minimax near-optimal rates. However, it remains unclear how to efficiently implement UCB in realistic RL tasks that involve non-linear function approximation. To address this, we propose a new exploration approach via maximizing the deviation of the occupancy of the next policy from the explored regions. We add this term as an adaptive regularizer to the standard RL objective to balance exploration vs. exploitation. We pair the new objective with a provably convergent algorithm, giving rise to a new intrinsic reward that adjusts existing bonuses. The proposed intrinsic reward is easy to implement and combine with other existing RL algorithms to conduct exploration. As a proof of concept, we evaluate the new intrinsic reward on tabular examples across a variety of model-based and model-free algorithms, showing improvements over count-only exploration strategies. When tested on navigation and locomotion tasks from MiniGrid and DeepMind Control Suite benchmarks, our approach significantly improves sample efficiency over state-of-the-art methods.
Author Information
Tianjun Zhang (University of California, Berkeley)
Paria Rashidinejad (University of California, Berkeley)
Jiantao Jiao (University of California, Berkeley)
Yuandong Tian (Facebook AI Research)
Joseph Gonzalez (UC Berkeley)
Stuart Russell (UC Berkeley)
More from the Same Authors
-
2021 : TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers »
Lianmin Zheng · Ruochen Liu · Junru Shao · Tianqi Chen · Joseph Gonzalez · Ion Stoica · Ameer Haj-Ali -
2021 Spotlight: Uncertain Decisions Facilitate Better Preference Learning »
Cassidy Laidlaw · Stuart Russell -
2021 : An Empirical Investigation of Representation Learning for Imitation »
Cynthia Chen · Sam Toyer · Cody Wild · Scott Emmons · Ian Fischer · Kuang-Huei Lee · Neel Alex · Steven Wang · Ping Luo · Stuart Russell · Pieter Abbeel · Rohin Shah -
2021 : Effect of Model Size on Worst-group Generalization »
Alan Pham · Eunice Chan · Vikranth Srivatsa · Dhruba Ghosh · Yaoqing Yang · Yaodong Yu · Ruiqi Zhong · Joseph Gonzalez · Jacob Steinhardt -
2021 : Cross-Domain Imitation Learning via Optimal Transport »
Arnaud Fickinger · Samuel Cohen · Stuart Russell · Brandon Amos -
2021 : C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks »
Tianjun Zhang · Ben Eysenbach · Russ Salakhutdinov · Sergey Levine · Joseph Gonzalez -
2021 : Graph Backup: Data Efficient Backup Exploiting Markovian Data »
zhengyao Jiang · Tianjun Zhang · Robert Kirk · Tim Rocktäschel · Edward Grefenstette -
2022 : Efficient Planning in a Compact Latent Action Space »
zhengyao Jiang · Tianjun Zhang · Michael Janner · Yueying (Lisa) Li · Tim Rocktäschel · Edward Grefenstette · Yuandong Tian -
2022 : Adversarial Policies Beat Professional-Level Go AIs »
Tony Wang · Adam Gleave · Nora Belrose · Tom Tseng · Michael Dennis · Yawen Duan · Viktor Pogrebniak · Joseph Miller · Sergey Levine · Stuart Russell -
2022 : Panel RL Implementation »
Xiaolin Ge · Alborz Geramifard · Kence Anderson · Craig Buhr · Robert Nishihara · Yuandong Tian -
2022 Poster: Contrastive Learning as Goal-Conditioned Reinforcement Learning »
Benjamin Eysenbach · Tianjun Zhang · Sergey Levine · Russ Salakhutdinov -
2022 Poster: Beyond the Best: Distribution Functional Estimation in Infinite-Armed Bandits »
Yifei Wang · Tavor Baharav · Yanjun Han · Jiantao Jiao · David Tse -
2022 Poster: Minimax Optimal Online Imitation Learning via Replay Estimation »
Gokul Swamy · Nived Rajaraman · Matt Peng · Sanjiban Choudhury · J. Bagnell · Steven Wu · Jiantao Jiao · Kannan Ramchandran -
2021 : ML-guided iterative refinement for system optimization »
Yuandong Tian -
2021 : Community Infrastructure for Applying Reinforcement Learning to Compiler Optimizations »
Chris Cummins · Bram Wasti · Brandon Cui · Olivier Teytaud · Benoit Steiner · Yuandong Tian · Hugh Leather -
2021 Poster: Accelerating Quadratic Optimization with Reinforcement Learning »
Jeffrey Ichnowski · Paras Jain · Bartolomeo Stellato · Goran Banjac · Michael Luo · Francesco Borrelli · Joseph Gonzalez · Ion Stoica · Ken Goldberg -
2021 Poster: Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL »
Charles Packer · Pieter Abbeel · Joseph Gonzalez -
2021 Poster: RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem »
Eric Liang · Zhanghao Wu · Michael Luo · Sven Mika · Joseph Gonzalez · Ion Stoica -
2021 Poster: Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages »
Xinyun Chen · Dawn Song · Yuandong Tian -
2021 : BASALT: A MineRL Competition on Solving Human-Judged Task + Q&A »
Rohin Shah · Cody Wild · Steven Wang · Neel Alex · Brandon Houghton · William Guss · Sharada Mohanty · Stephanie Milani · Nicholay Topin · Pieter Abbeel · Stuart Russell · Anca Dragan -
2021 : Machine Learning for Combinatorial Optimization + Q&A »
Maxime Gasse · Simon Bowly · Chris Cameron · Quentin Cappart · Jonas Charfreitag · Laurent Charlin · Shipra Agrawal · Didier Chetelat · Justin Dumouchelle · Ambros Gleixner · Aleksandr Kazachkov · Elias Khalil · Pawel Lichocki · Andrea Lodi · Miles Lubin · Christopher Morris · Dimitri Papageorgiou · Augustin Parjadis · Sebastian Pokutta · Antoine Prouvost · Yuandong Tian · Lara Scavuzzo · Giulia Zarpellon -
2021 Poster: Scalable Online Planning via Reinforcement Learning Fine-Tuning »
Arnaud Fickinger · Hengyuan Hu · Brandon Amos · Stuart Russell · Noam Brown -
2021 Poster: Representing Long-Range Context for Graph Neural Networks with Global Attention »
Zhanghao Wu · Paras Jain · Matthew Wright · Azalia Mirhoseini · Joseph Gonzalez · Ion Stoica -
2021 Poster: Uncertain Decisions Facilitate Better Preference Learning »
Cassidy Laidlaw · Stuart Russell -
2021 Poster: Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism »
Paria Rashidinejad · Banghua Zhu · Cong Ma · Jiantao Jiao · Stuart Russell -
2021 Poster: On the Value of Interaction and Function Approximation in Imitation Learning »
Nived Rajaraman · Yanjun Han · Lin Yang · Jingbo Liu · Jiantao Jiao · Kannan Ramchandran -
2021 Poster: NovelD: A Simple yet Effective Exploration Criterion »
Tianjun Zhang · Huazhe Xu · Xiaolong Wang · Yi Wu · Kurt Keutzer · Joseph Gonzalez · Yuandong Tian -
2021 Poster: Learning Space Partitions for Path Planning »
Kevin Yang · Tianjun Zhang · Chris Cummins · Brandon Cui · Benoit Steiner · Linnan Wang · Joseph Gonzalez · Dan Klein · Yuandong Tian -
2021 Poster: Taxonomizing local versus global structure in neural network loss landscapes »
Yaoqing Yang · Liam Hodgkinson · Ryan Theisen · Joe Zou · Joseph Gonzalez · Kannan Ramchandran · Michael Mahoney -
2020 : QA: Yuandong Tian »
Yuandong Tian -
2020 : Contributed Talk: Yuandong Tian »
Yuandong Tian -
2020 : Invited Talk (Yuandong Tian) »
Yuandong Tian -
2020 Workshop: Navigating the Broader Impacts of AI Research »
Carolyn Ashurst · Rosie Campbell · Deborah Raji · Solon Barocas · Stuart Russell -
2020 Poster: Boundary thickness and robustness in learning models »
Yaoqing Yang · Rajiv Khanna · Yaodong Yu · Amir Gholami · Kurt Keutzer · Joseph Gonzalez · Kannan Ramchandran · Michael Mahoney -
2020 Poster: A Statistical Framework for Low-bitwidth Training of Deep Neural Networks »
Jianfei Chen · Yu Gai · Zhewei Yao · Michael Mahoney · Joseph Gonzalez -
2020 Poster: Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search »
Linnan Wang · Rodrigo Fonseca · Yuandong Tian -
2020 Poster: Toward the Fundamental Limits of Imitation Learning »
Nived Rajaraman · Lin Yang · Jiantao Jiao · Kannan Ramchandran -
2020 Poster: The MAGICAL Benchmark for Robust Imitation »
Sam Toyer · Rohin Shah · Andrew Critch · Stuart Russell -
2020 Poster: SLIP: Learning to Predict in Unknown Dynamical Systems with Long-Term Memory »
Paria Rashidinejad · Jiantao Jiao · Stuart Russell -
2020 Poster: Joint Policy Search for Multi-agent Collaboration with Imperfect Information »
Yuandong Tian · Qucheng Gong · Yu Jiang -
2020 Oral: SLIP: Learning to Predict in Unknown Dynamical Systems with Long-Term Memory »
Paria Rashidinejad · Jiantao Jiao · Stuart Russell -
2020 Poster: Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design »
Michael Dennis · Natasha Jaques · Eugene Vinitsky · Alexandre Bayen · Stuart Russell · Andrew Critch · Sergey Levine -
2020 Oral: Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design »
Michael Dennis · Natasha Jaques · Eugene Vinitsky · Alexandre Bayen · Stuart Russell · Andrew Critch · Sergey Levine -
2019 Workshop: Information Theory and Machine Learning »
Shengjia Zhao · Jiaming Song · Yanjun Han · Kristy Choi · Pratyusha Kalluri · Ben Poole · Alex Dimakis · Jiantao Jiao · Tsachy Weissman · Stefano Ermon -
2019 Workshop: MLSys: Workshop on Systems for ML »
Aparna Lakshmiratan · Siddhartha Sen · Joseph Gonzalez · Dan Crankshaw · Sarah Bird -
2019 Poster: Coda: An End-to-End Neural Program Decompiler »
Cheng Fu · Huili Chen · Haolan Liu · Xinyun Chen · Yuandong Tian · Farinaz Koushanfar · Jishen Zhao -
2019 Poster: ANODEV2: A Coupled Neural ODE Framework »
Tianjun Zhang · Zhewei Yao · Amir Gholami · Joseph Gonzalez · Kurt Keutzer · Michael Mahoney · George Biros -
2019 Poster: Hierarchical Decision Making by Generating and Following Natural Language Instructions »
Hengyuan Hu · Denis Yarats · Qucheng Gong · Yuandong Tian · Mike Lewis -
2019 Poster: One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers »
Ari Morcos · Haonan Yu · Michela Paganini · Yuandong Tian -
2019 Poster: Learning to Perform Local Rewriting for Combinatorial Optimization »
Xinyun Chen · Yuandong Tian -
2018 Poster: Meta-Learning MCMC Proposals »
Tongzhou Wang · YI WU · Dave Moore · Stuart Russell -
2018 Poster: Learning Plannable Representations with Causal InfoGAN »
Thanard Kurutach · Aviv Tamar · Ge Yang · Stuart Russell · Pieter Abbeel -
2017 Poster: ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games »
Yuandong Tian · Qucheng Gong · Wendy Shang · Yuxin Wu · Larry Zitnick -
2017 Oral: ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games »
Yuandong Tian · Qucheng Gong · Wendy Shang · Yuxin Wu · Larry Zitnick