Timezone: »
Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning
Souradip Chakraborty · Amrit Bedi · Alec Koppel · Furong Huang · Pratap Tokekar · Dinesh Manocha
Event URL: https://openreview.net/forum?id=GdGpM3VWWXD »
Model-based reinforcement learning (MBRL) exhibits favorable performance in practice, but its theoretical guarantees are mostly restricted to the setting when the transition model is Gaussian or Lipschitz and demands a posterior estimate whose representational complexity grows unbounded with time. In this work, we develop a novel MBRL method (i) which relaxes the assumptions on the target transition model to belong to a generic family of mixture models; (ii) is applicable to large-scale training by incorporating a compression step such that the posterior estimate consists of a \emph{Bayesian coreset} of only statistically significant past state-action pairs; and (iii) {exhibits a Bayesian regret of $\mathcal{O}(dH^{1+({\alpha}/{2})}T^{1-({\alpha}/{2})})$ with coreset size of $\Omega(\sqrt{T^{1+\alpha}})$, where $d$ is the aggregate dimension of state action space, $H$ is the episode length, $T$ is the total number of time steps experienced, and $\alpha\in (0,1]$ is the tuning parameter which is a novel introduction into the analysis of MBRL in this work}. To achieve these results, we adopt an approach based upon Stein's method, which allows distributional distance to be evaluated in closed form as the kernelized Stein discrepancy (KSD). Experimentally, we observe that this approach is competitive with several state-of-the-art RL methodologies, and can achieve up to $50\%$ reduction in wall clock time in some continuous control environments.
Model-based reinforcement learning (MBRL) exhibits favorable performance in practice, but its theoretical guarantees are mostly restricted to the setting when the transition model is Gaussian or Lipschitz and demands a posterior estimate whose representational complexity grows unbounded with time. In this work, we develop a novel MBRL method (i) which relaxes the assumptions on the target transition model to belong to a generic family of mixture models; (ii) is applicable to large-scale training by incorporating a compression step such that the posterior estimate consists of a \emph{Bayesian coreset} of only statistically significant past state-action pairs; and (iii) {exhibits a Bayesian regret of $\mathcal{O}(dH^{1+({\alpha}/{2})}T^{1-({\alpha}/{2})})$ with coreset size of $\Omega(\sqrt{T^{1+\alpha}})$, where $d$ is the aggregate dimension of state action space, $H$ is the episode length, $T$ is the total number of time steps experienced, and $\alpha\in (0,1]$ is the tuning parameter which is a novel introduction into the analysis of MBRL in this work}. To achieve these results, we adopt an approach based upon Stein's method, which allows distributional distance to be evaluated in closed form as the kernelized Stein discrepancy (KSD). Experimentally, we observe that this approach is competitive with several state-of-the-art RL methodologies, and can achieve up to $50\%$ reduction in wall clock time in some continuous control environments.
Author Information
Souradip Chakraborty (University of Maryland, College Park)
Amrit Bedi (University of Maryland, College Park)
Alec Koppel (U.S. Army Research Laboratory)
Furong Huang (University of Maryland)
Pratap Tokekar (University of Maryland, College Park)
Dinesh Manocha (University of Maryland, College Park)
More from the Same Authors
-
2021 : Who Is the Strongest Enemy? Towards Optimal and Efficient Evasion Attacks in Deep RL »
Yanchao Sun · Ruijie Zheng · Yongyuan Liang · Furong Huang -
2021 : Efficiently Improving the Robustness of RL Agents against Strongest Adversaries »
Yongyuan Liang · Yanchao Sun · Ruijie Zheng · Furong Huang -
2021 : Uncertainty-aware Labelled Augmentations for High Dimensional Latent Space Bayesian Optimization »
Ekansh Verma · Souradip Chakraborty -
2021 : Uncertainty-aware Labelled Augmentations for High Dimensional Latent Space Bayesian Optimization »
Ekansh Verma · Souradip Chakraborty -
2022 : SMART: Self-supervised Multi-task pretrAining with contRol Transformers »
Yanchao Sun · shuang ma · Ratnesh Madaan · Rogerio Bonatti · Furong Huang · Ashish Kapoor -
2022 : GFairHint: Improving Individual Fairness for Graph Neural Networks via Fairness Hint »
Paiheng Xu · Yuhang Zhou · Bang An · Wei Ai · Furong Huang -
2022 : Controllable Attack and Improved Adversarial Training in Multi-Agent Reinforcement Learning »
Xiangyu Liu · Souradip Chakraborty · Furong Huang -
2022 : Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity »
Mucong Ding · Tahseen Rabbani · Bang An · Evan Wang · Furong Huang -
2022 : Faster Hyperparameter Search on Graphs via Calibrated Dataset Condensation »
Mucong Ding · Xiaoyu Liu · Tahseen Rabbani · Furong Huang -
2022 : DP-InstaHide: Data Augmentations Provably Enhance Guarantees Against Dataset Manipulations »
Eitan Borgnia · Jonas Geiping · Valeriia Cherepanova · Liam Fowl · Arjun Gupta · Amin Ghiasi · Furong Huang · Micah Goldblum · Tom Goldstein -
2022 : Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function »
Ruijie Zheng · Xiyao Wang · Huazhe Xu · Furong Huang -
2022 : Contributed Talk: Controllable Attack and Improved Adversarial Training in Multi-Agent Reinforcement Learning »
Xiangyu Liu · Souradip Chakraborty · Furong Huang -
2022 Spotlight: Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach »
Kaiwen Yang · Yanchao Sun · Jiahao Su · Fengxiang He · Xinmei Tian · Furong Huang · Tianyi Zhou · Dacheng Tao -
2022 : SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication »
Marco Bornstein · Tahseen Rabbani · Evan Wang · Amrit Bedi · Furong Huang -
2022 Poster: Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability »
Roman Levin · Manli Shu · Eitan Borgnia · Furong Huang · Micah Goldblum · Tom Goldstein -
2022 Poster: Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity »
Mucong Ding · Tahseen Rabbani · Bang An · Evan Wang · Furong Huang -
2022 Poster: Efficient Adversarial Training without Attacking: Worst-Case-Aware Robust Reinforcement Learning »
Yongyuan Liang · Yanchao Sun · Ruijie Zheng · Furong Huang -
2022 Poster: End-to-end Algorithm Synthesis with Recurrent Networks: Extrapolation without Overthinking »
Arpit Bansal · Avi Schwarzschild · Eitan Borgnia · Zeyad Emam · Furong Huang · Micah Goldblum · Tom Goldstein -
2022 Poster: Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach »
Kaiwen Yang · Yanchao Sun · Jiahao Su · Fengxiang He · Xinmei Tian · Furong Huang · Tianyi Zhou · Dacheng Tao -
2022 Poster: Transferring Fairness under Distribution Shifts via Fair Consistency Regularization »
Bang An · Zora Che · Mucong Ding · Furong Huang -
2021 : Who Is the Strongest Enemy? Towards Optimal and Efficient Evasion Attacks in Deep RL »
Yanchao Sun · Ruijie Zheng · Yongyuan Liang · Furong Huang -
2021 : Efficiently Improving the Robustness of RL Agents against Strongest Adversaries »
Yongyuan Liang · Yanchao Sun · Ruijie Zheng · Furong Huang -
2020 Poster: Variational Policy Gradient Method for Reinforcement Learning with General Utilities »
Junyu Zhang · Alec Koppel · Amrit Singh Bedi · Csaba Szepesvari · Mengdi Wang -
2020 Spotlight: Variational Policy Gradient Method for Reinforcement Learning with General Utilities »
Junyu Zhang · Alec Koppel · Amrit Singh Bedi · Csaba Szepesvari · Mengdi Wang -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova