Timezone: »
We introduce a constrained optimization method for policy gradient reinforcement learning, which uses two trust regions to regulate each policy update. In addition to using the proximity of one single old policy as the first trust region as done by prior works, we propose forming a second trust region by constructing another virtual policy that represents a wide range of past policies. We then enforce the new policy to stay closer to the virtual policy, which is beneficial if the old policy performs poorly. We propose a mechanism to automatically build the virtual policy from a memory buffer of past policies, providing a new capability for dynamically selecting appropriate trust regions during the optimization process. Our proposed method, dubbed Memory-Constrained Policy Optimization (MCPO), is examined in diverse environments, including robotic locomotion control, navigation with sparse rewards and Atari games, consistently demonstrating competitive performance against recent on-policy constrained policy gradient methods.
Author Information
Thai Hung Le (Deakin University)
Thommen Karimpanal George (Deakin University)
Majid Abdolshah (Amazon)
Dung Nguyen (Deakin University)
Kien Do (Deakin University)
Sunil Gupta (Deakin University)
Svetha Venkatesh (Deakin University)
More from the Same Authors
-
2021 : Offline neural contextual bandits: Pessimism, Optimization and Generalization »
Thanh Nguyen-Tang · Sunil Gupta · A. Tuan Nguyen · Svetha Venkatesh -
2022 Poster: Functional Indirection Neural Estimator for Better Out-of-distribution Generalization »
Kha Pham · Thai Hung Le · Man Ngo · Truyen Tran -
2022 : Improving Domain Generalization with Interpolation Robustness »
Ragja Palakkadavath · Thanh Nguyen-Tang · Sunil Gupta · Svetha Venkatesh -
2022 : Improving Domain Generalization with Interpolation Robustness »
Ragja Palakkadavath · Thanh Nguyen-Tang · Sunil Gupta · Svetha Venkatesh -
2022 Spotlight: Lightning Talks 5A-2 »
Qiang LI · Zhiwei Xu · Jia-Qi Yang · Thai Hung Le · Haoxuan Qu · Yang Li · Artyom Sorokin · Peirong Zhang · Mira Finkelstein · Nitsan levy · Chung-Yiu Yau · dapeng li · Thommen Karimpanal George · De-Chuan Zhan · Nazar Buzun · Jiajia Jiang · Li Xu · Yichuan Mo · Yujun Cai · Yuliang Liu · Leonid Pugachev · Bin Zhang · Lucy Liu · Hoi-To Wai · Liangliang Shi · Majid Abdolshah · Yoav Kolumbus · Lin Geng Foo · Junchi Yan · Mikhail Burtsev · Lianwen Jin · Yuan Zhan · Dung Nguyen · David Parkes · Yunpeng Baiia · Jun Liu · Kien Do · Guoliang Fan · Jeffrey S Rosenschein · Sunil Gupta · Sarah Keren · Svetha Venkatesh -
2022 Spotlight: Learning to Constrain Policy Optimization with Virtual Trust Region »
Thai Hung Le · Thommen Karimpanal George · Majid Abdolshah · Dung Nguyen · Kien Do · Sunil Gupta · Svetha Venkatesh -
2022 Poster: Human-AI Collaborative Bayesian Optimisation »
Arun Kumar A V · Santu Rana · Alistair Shilton · Svetha Venkatesh -
2022 Poster: Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation »
Kien Do · Thai Hung Le · Dung Nguyen · Dang Nguyen · HARIPRIYA HARIKUMAR · Truyen Tran · Santu Rana · Svetha Venkatesh -
2022 Poster: Expected Improvement for Contextual Bandits »
Hung The Tran · Sunil Gupta · Santu Rana · Tuan Truong · Long Tran-Thanh · Svetha Venkatesh -
2021 Poster: Model-Based Episodic Memory Induces Dynamic Hybrid Controls »
Hung Le · Thommen Karimpanal George · Majid Abdolshah · Truyen Tran · Svetha Venkatesh -
2021 Poster: Kernel Functional Optimisation »
Arun Kumar Anjanapura Venkatesh · Alistair Shilton · Santu Rana · Sunil Gupta · Svetha Venkatesh -
2020 Poster: Sub-linear Regret Bounds for Bayesian Optimisation in Unknown Search Spaces »
Hung The Tran · Sunil Gupta · Santu Rana · Huong Ha · Svetha Venkatesh -
2019 Poster: Bayesian Optimization with Unknown Search Space »
Huong Ha · Santu Rana · Sunil Gupta · Thanh Nguyen-Tang · Hung The Tran · Svetha Venkatesh -
2019 Poster: Multi-objective Bayesian optimisation with preferences over objectives »
Majid Abdolshah · Alistair Shilton · Santu Rana · Sunil Gupta · Svetha Venkatesh -
2018 Poster: Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation »
Shivapratap Gopakumar · Sunil Gupta · Santu Rana · Vu Nguyen · Svetha Venkatesh -
2018 Poster: Variational Memory Encoder-Decoder »
Hung Le · Truyen Tran · Thin Nguyen · Svetha Venkatesh -
2017 Poster: Process-constrained batch Bayesian optimisation »
Pratibha Vellanki · Santu Rana · Sunil Gupta · David Rubin · Alessandra Sutti · Thomas Dorin · Murray Height · Paul Sanders · Svetha Venkatesh -
2017 Spotlight: Process-constrained batch Bayesian optimisation »
Pratibha Vellanki · Santu Rana · Sunil Gupta · David Rubin · Alessandra Sutti · Thomas Dorin · Murray Height · Paul Sanders · Svetha Venkatesh