Timezone: »
One of the main challenges in imitation learning is determining what action an agent should take when outside the state distribution of the demonstrations. Inverse reinforcement learning (IRL) can enable generalization to new states by learning a parameterized reward function, but these approaches still face uncertainty over the true reward function and corresponding optimal policy. Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function. While completely ignoring risk can lead to overly aggressive and unsafe policies, optimizing in a fully adversarial sense is also problematic as it can lead to overly conservative policies that perform poorly in practice. To provide a bridge between these two extremes, we propose Bayesian Robust Optimization for Imitation Learning (BROIL). BROIL leverages Bayesian reward function inference and a user specific risk tolerance to efficiently optimize a robust policy that balances expected return and conditional value at risk. Our empirical results show that BROIL provides a natural way to interpolate between return-maximizing and risk-minimizing behaviors and outperforms existing risk-sensitive and risk-neutral inverse reinforcement learning algorithms.
Author Information
Daniel S. Brown (UC Berkeley)
Scott Niekum (UT Austin)
Marek Petrik (University of New Hampshire)
More from the Same Authors
-
2021 : Unbiased Efficient Feature Counts for Inverse RL »
Gerard Donahue · Brendan Crowe · Marek Petrik · Daniel Brown -
2021 : Behavior Policy Search for Risk Estimators in Reinforcement Learning »
Elita Lobo · Marek Petrik · Dharmashankar Subramanian -
2022 : Language-guided Task Adaptation for Imitation Learning »
Prasoon Goyal · Raymond Mooney · Scott Niekum -
2022 : A Ranking Game for Imitation Learning »
Harshit Sushil Sikchi · Akanksha Saran · Wonjoon Goo · Scott Niekum -
2023 Poster: Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor »
Julien Grand-Clément · Marek Petrik -
2023 Poster: Percentile Criterion Optimization in Offline Reinforcement Learning »
Cyrus Cousins · Elita Lobo · Marek Petrik · Yair Zick -
2023 Poster: On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes »
Jia Lin Hau · Erick Delage · Mohammad Ghavamzadeh · Marek Petrik -
2022 Workshop: All Things Attention: Bridging Different Perspectives on Attention »
Abhijat Biswas · Akanksha Saran · Khimya Khetarpal · Reuben Aronson · Ruohan Zhang · Grace Lindsay · Scott Niekum -
2022 Poster: Robust $\phi$-Divergence MDPs »
Chin Pang Ho · Marek Petrik · Wolfram Wiesemann -
2021 : Safe RL Panel Discussion »
Animesh Garg · Marek Petrik · Shie Mannor · Claire Tomlin · Ugo Rosolia · Dylan Hadfield-Menell -
2021 Workshop: Safe and Robust Control of Uncertain Systems »
Ashwin Balakrishna · Brijen Thananjeyan · Daniel Brown · Marek Petrik · Melanie Zeilinger · Sylvia Herbert -
2021 Poster: Adversarial Intrinsic Motivation for Reinforcement Learning »
Ishan Durugkar · Mauricio Tec · Scott Niekum · Peter Stone -
2021 Poster: SOPE: Spectrum of Off-Policy Estimators »
Christina Yuan · Yash Chandak · Stephen Giguere · Philip Thomas · Scott Niekum -
2021 Poster: Universal Off-Policy Evaluation »
Yash Chandak · Scott Niekum · Bruno da Silva · Erik Learned-Miller · Emma Brunskill · Philip Thomas -
2021 Poster: Fast Algorithms for $L_\infty$-constrained S-rectangular Robust MDPs »
Bahram Behzadian · Marek Petrik · Chin Pang Ho -
2019 : Scott Niekum: Scaling Probabilistically Safe Learning to Robotics »
Scott Niekum -
2019 : Poster Session »
Ahana Ghosh · Javad Shafiee · Akhilan Boopathy · Alex Tamkin · Theodoros Vasiloudis · Vedant Nanda · Ali Baheri · Paul Fieguth · Andrew Bennett · Guanya Shi · Hao Liu · Arushi Jain · Jacob Tyo · Benjie Wang · Boxiao Chen · Carroll Wainwright · Chandramouli Shama Sastry · Chao Tang · Daniel S. Brown · David Inouye · David Venuto · Dhruv Ramani · Dimitrios Diochnos · Divyam Madaan · Dmitrii Krashenikov · Joel Oren · Doyup Lee · Eleanor Quint · elmira amirloo · Matteo Pirotta · Gavin Hartnett · Geoffroy Dubourg-Felonneau · Gokul Swamy · Pin-Yu Chen · Ilija Bogunovic · Jason Carter · Javier Garcia-Barcos · Jeet Mohapatra · Jesse Zhang · Jian Qian · John Martin · Oliver Richter · Federico Zaiter · Tsui-Wei Weng · Karthik Abinav Sankararaman · Kyriakos Polymenakos · Lan Hoang · mahdieh abbasi · Marco Gallieri · Mathieu Seurin · Matteo Papini · Matteo Turchetta · Matthew Sotoudeh · Mehrdad Hosseinzadeh · Nathan Fulton · Masatoshi Uehara · Niranjani Prasad · Oana-Maria Camburu · Patrik Kolaric · Philipp Renz · Prateek Jaiswal · Reazul Hasan Russel · Riashat Islam · Rishabh Agarwal · Alexander Aldrick · Sachin Vernekar · Sahin Lale · Sai Kiran Narayanaswami · Samuel Daulton · Sanjam Garg · Sebastian East · Shun Zhang · Soheil Dsidbari · Justin Goodwin · Victoria Krakovna · Wenhao Luo · Wesley Chung · Yuanyuan Shi · Yuh-Shyang Wang · Hongwei Jin · Ziping Xu -
2019 Workshop: Safety and Robustness in Decision-making »
Mohammad Ghavamzadeh · Shie Mannor · Yisong Yue · Marek Petrik · Yinlam Chow -
2019 Poster: Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs »
Marek Petrik · Reazul Hasan Russel -
2018 Poster: Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes »
Andrea Tirinzoni · Marek Petrik · Xiangli Chen · Brian Ziebart -
2018 Spotlight: Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes »
Andrea Tirinzoni · Marek Petrik · Xiangli Chen · Brian Ziebart -
2016 Poster: Safe Policy Improvement by Minimizing Robust Baseline Regret »
Mohammad Ghavamzadeh · Marek Petrik · Yinlam Chow -
2015 Poster: Policy Evaluation Using the Ω-Return »
Philip Thomas · Scott Niekum · Georgios Theocharous · George Konidaris -
2014 Workshop: From Bad Models to Good Policies (Sequential Decision Making under Uncertainty) »
Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor · Jeremie Mary · Laurent Orseau · Thomas Dietterich · Ronald Ortner · Peter Grünwald · Joelle Pineau · Raphael Fonteneau · Georgios Theocharous · Esteban D Arcaute · Christos Dimitrakakis · Nan Jiang · Doina Precup · Pierre-Luc Bacon · Marek Petrik · Aviv Tamar -
2014 Poster: RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning »
Marek Petrik · Dharmashankar Subramanian -
2014 Spotlight: RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning »
Marek Petrik · Dharmashankar Subramanian