Timezone: »
Inverse reinforcement learning (IRL) is a powerful framework for learning the reward function of an RL agent by observing its behavior. The earliest IRL algorithms were used to infer point estimates of the reward function, but these can be misleading when several reward functions can accurately describe an agent's behavior. In contrast, A Bayesian approach to IRL models a distribution over possible reward functions that explain the set of observations, alleviating the shortcomings of learning a single point estimate. However, most Bayesian IRL algorithms estimate the likelihood using a Q-value function that best approximates the long-term expected reward for a given state-action pair. This can be computationally demanding because it requires solving a Markov Decision Process (MDP) in every iteration of Markov chain Monte Carlo (MCMC) sampling. In response, we introduce kernel density Bayesian inverse reinforcement learning (KD-BIRL), a method that (1) uses kernel density estimation for the likelihood, leading to theoretical guarantees on the resulting posterior distribution, and (2) disassociates the number of times Q-learning is required with the number of iterations of MCMC sampling.
Author Information
Aishwarya Mandyam (Stanford University)
Didong Li (UNC Chapel Hill)
Diana Cai (Princeton University)
Andrew Jones (Princeton University)
Barbara Engelhardt (Stanford University)
More from the Same Authors
-
2021 Spotlight: Slice Sampling Reparameterization Gradients »
David Zoltowski · Diana Cai · Ryan Adams -
2022 : Multi-fidelity Bayesian experimental design using power posteriors »
Andrew Jones · Diana Cai · Barbara Engelhardt -
2022 : Spatially-aware dimension reduction of transcriptomics data »
Lauren Okamoto · Andrew Jones · Archit Verma · Barbara E Engelhardt -
2022 Poster: Multi-fidelity Monte Carlo: a pseudo-marginal approach »
Diana Cai · Ryan Adams -
2021 Workshop: Your Model is Wrong: Robustness and misspecification in probabilistic modeling »
Diana Cai · Sameer Deshpande · Michael Hughes · Tamara Broderick · Trevor Campbell · Nick Foti · Barbara Engelhardt · Sinead Williamson -
2021 Poster: Slice Sampling Reparameterization Gradients »
David Zoltowski · Diana Cai · Ryan Adams -
2019 : Break / Poster Session 1 »
Antonia Marcu · Yao-Yuan Yang · Pascale Gourdeau · Chen Zhu · Thodoris Lykouris · Jianfeng Chi · Mark Kozdoba · Arjun Nitin Bhagoji · Xiaoxia Wu · Jay Nandy · Michael T Smith · Bingyang Wen · Yuege Xie · Konstantinos Pitas · Suprosanna Shit · Maksym Andriushchenko · Dingli Yu · GaĆ«l Letarte · Misha Khodak · Hussein Mozannar · Chara Podimata · James Foulds · Yizhen Wang · Huishuai Zhang · Ondrej Kuzelka · Alexander Levine · Nan Lu · Zakaria Mhammedi · Paul Viallard · Diana Cai · Lovedeep Gondara · James Lucas · Yasaman Mahdaviyeh · Aristide Baratin · Rishi Bommasani · Alessandro Barp · Andrew Ilyas · Kaiwen Wu · Jens Behrmann · Omar Rivasplata · Amir Nazemi · Aditi Raghunathan · Will Stephenson · Sahil Singla · Akhil Gupta · YooJung Choi · Yannic Kilcher · Clare Lyle · Edoardo Manino · Andrew Bennett · Zhi Xu · Niladri Chatterji · Emre Barut · Flavien Prost · Rodrigo Toro Icarte · Arno Blaas · Chulhee Yun · Sahin Lale · YiDing Jiang · Tharun Kumar Reddy Medini · Ashkan Rezaei · Alexander Meinke · Stephen Mell · Gary Kazantsev · Shivam Garg · Aradhana Sinha · Vishnu Lokhande · Geovani Rizk · Han Zhao · Aditya Kumar Akash · Jikai Hou · Ali Ghodsi · Matthias Hein · Tyler Sypherd · Yichen Yang · Anastasia Pentina · Pierre Gillot · Antoine Ledent · Guy Gur-Ari · Noah MacAulay · Tianzong Zhang -
2018 Poster: A Bayesian Nonparametric View on Count-Min Sketch »
Diana Cai · Michael Mitzenmacher · Ryan Adams