Timezone: »
We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning (RL) setting. We consider the scenario where: (i) we have a dataset collected under a known baseline policy, (ii) multiple reward signals are received from the environment inducing as many objectives to optimize. We present an SPI formulation for this RL setting that takes into account the preferences of the algorithm’s user for handling the trade-offs for different reward signals while ensuring that the new policy performs at least as well as the baseline policy along each individual objective. We build on traditional SPI algorithms and propose a novel method based on Safe Policy Iteration with Baseline Bootstrapping (SPIBB, Laroche et al., 2019) that provides high probability guarantees on the performance of the agent in the true environment. We show the effectiveness of our method on a synthetic grid-world safety task as well as in a real-world critical care context to learn a policy for the administration of IV fluids and vasopressors to treat sepsis.
Author Information
harsh satija (McGill University)
Philip Thomas (University of Massachusetts Amherst)
Joelle Pineau (McGill University)
Joelle Pineau is an Associate Professor and William Dawson Scholar at McGill University where she co-directs the Reasoning and Learning Lab. She also leads the Facebook AI Research lab in Montreal, Canada. She holds a BASc in Engineering from the University of Waterloo, and an MSc and PhD in Robotics from Carnegie Mellon University. Dr. Pineau's research focuses on developing new models and algorithms for planning and learning in complex partially-observable domains. She also works on applying these algorithms to complex problems in robotics, health care, games and conversational agents. She serves on the editorial board of the Journal of Artificial Intelligence Research and the Journal of Machine Learning Research and is currently President of the International Machine Learning Society. She is a recipient of NSERC's E.W.R. Steacie Memorial Fellowship (2018), a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), a Senior Fellow of the Canadian Institute for Advanced Research (CIFAR) and in 2016 was named a member of the College of New Scholars, Artists and Scientists by the Royal Society of Canada.
Romain Laroche (Microsoft Research)
More from the Same Authors
-
2021 : Block Contextual MDPs for Continual Learning »
Shagun Sodhani · Franziska Meier · Joelle Pineau · Amy Zhang -
2022 Poster: Discrete Compositional Representations as an Abstraction for Goal Conditioned Reinforcement Learning »
Riashat Islam · Hongyu Zang · Anirudh Goyal · Alex Lamb · Kenji Kawaguchi · Xin Li · Romain Laroche · Yoshua Bengio · Remi Tachet des Combes -
2022 : Optimization using Parallel Gradient Evaluations on Multiple Parameters »
Yash Chandak · Shiv Shankar · Venkata Gandikota · Philip Thomas · Arya Mazumdar -
2023 Poster: Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning »
Hongyu Zang · Xin Li · Leiji Zhang · Yang Liu · Baigui Sun · Riashat Islam · Remi Tachet des Combes · Romain Laroche -
2023 Poster: Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets »
Zhang-Wei Hong · Aviral Kumar · Sathwik Karnik · Abhishek Bhandwaldar · Akash Srivastava · Joni Pajarinen · Romain Laroche · Abhishek Gupta · Pulkit Agrawal -
2023 Poster: Behavior Alignment via Reward Function Optimization »
Dhawal Gupta · Yash Chandak · Scott Jordan · Philip Thomas · Bruno da Silva -
2022 Poster: When does return-conditioned supervised learning work for offline reinforcement learning? »
David Brandfonbrener · Alberto Bietti · Jacob Buckman · Romain Laroche · Joan Bruna -
2022 Poster: Off-Policy Evaluation for Action-Dependent Non-stationary Environments »
Yash Chandak · Shiv Shankar · Nathaniel Bastian · Bruno da Silva · Emma Brunskill · Philip Thomas -
2021 : Q&A for Philip Thomas »
Philip Thomas -
2021 : Advances in (High-Confidence) Off-Policy Evaluation »
Philip Thomas -
2021 : Invited Speaker Panel »
Sham Kakade · Minmin Chen · Philip Thomas · Angela Schoellig · Barbara Engelhardt · Doina Precup · George Tucker -
2021 : What makes for an interesting RL problem? »
Joelle Pineau -
2021 Poster: SOPE: Spectrum of Off-Policy Estimators »
Christina Yuan · Yash Chandak · Stephen Giguere · Philip Thomas · Scott Niekum -
2021 Poster: Dr Jekyll & Mr Hyde: the strange case of off-policy policy updates »
Romain Laroche · Remi Tachet des Combes -
2021 Poster: Universal Off-Policy Evaluation »
Yash Chandak · Scott Niekum · Bruno da Silva · Erik Learned-Miller · Emma Brunskill · Philip Thomas -
2021 Poster: Structural Credit Assignment in Neural Networks using Reinforcement Learning »
Dhawal Gupta · Gabor Mihucz · Matthew Schlegel · James Kostas · Philip Thomas · Martha White -
2020 : Joelle Pineau - Can pre-registration lead to better reproducibility in ML research? »
Joelle Pineau -
2020 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · Joelle Pineau · David Silver · Satinder Singh · Coline Devin · Misha Laskin · Kimin Lee · Janarthanan Rajendran · Vivek Veeriah -
2020 Workshop: ML Retrospectives, Surveys & Meta-Analyses (ML-RSA) »
Chhavi Yadav · Prabhu Pradhan · Jesse Dodge · Mayoore Jaiswal · Peter Henderson · Abhishek Gupta · Ryan Lowe · Jessica Forde · Joelle Pineau -
2020 Poster: Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization »
Paul Barde · Julien Roy · Wonseok Jeon · Joelle Pineau · Chris Pal · Derek Nowrouzezahrai -
2020 Spotlight: Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization »
Paul Barde · Julien Roy · Wonseok Jeon · Joelle Pineau · Chris Pal · Derek Nowrouzezahrai -
2020 Poster: Towards Safe Policy Improvement for Non-Stationary MDPs »
Yash Chandak · Scott Jordan · Georgios Theocharous · Martha White · Philip Thomas -
2020 Spotlight: Towards Safe Policy Improvement for Non-Stationary MDPs »
Yash Chandak · Scott Jordan · Georgios Theocharous · Martha White · Philip Thomas -
2020 Poster: Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms »
Pinar Ozisik · Philip Thomas -
2020 Poster: Learning Dynamic Belief Graphs to Generalize on Text-Based Games »
Ashutosh Adhikari · Xingdi Yuan · Marc-Alexandre Côté · Mikuláš Zelinka · Marc-Antoine Rondeau · Romain Laroche · Pascal Poupart · Jian Tang · Adam Trischler · Will Hamilton -
2020 Poster: Novelty Search in Representational Space for Sample Efficient Exploration »
Ruo Yu Tao · Vincent Francois-Lavet · Joelle Pineau -
2020 Oral: Novelty Search in Representational Space for Sample Efficient Exploration »
Ruo Yu Tao · Vincent Francois-Lavet · Joelle Pineau -
2019 Workshop: Retrospectives: A Venue for Self-Reflection in ML Research »
Ryan Lowe · Yoshua Bengio · Joelle Pineau · Michela Paganini · Jessica Forde · Shagun Sodhani · Abhishek Gupta · Joel Lehman · Peter Henderson · Kanika Madan · Koustuv Sinha · Xavier Bouthillier -
2019 Poster: Offline Contextual Bandits with High Probability Fairness Guarantees »
Blossom Metevier · Stephen Giguere · Sarah Brockman · Ari Kobren · Yuriy Brun · Emma Brunskill · Philip Thomas -
2019 Poster: No-Press Diplomacy: Modeling Multi-Agent Gameplay »
Philip Paquette · Yuchen Lu · SETON STEVEN BOCCO · Max Smith · Satya O.-G. · Jonathan K. Kummerfeld · Joelle Pineau · Satinder Singh · Aaron Courville -
2019 Poster: A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning »
Francisco Garcia · Philip Thomas -
2018 : Joelle Pineau »
Joelle Pineau -
2018 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · David Silver · Satinder Singh · Joelle Pineau · Joshua Achiam · Rein Houthooft · Aravind Srinivas -
2018 Poster: Temporal Regularization for Markov Decision Process »
Pierre Thodoroff · Audrey Durand · Joelle Pineau · Doina Precup -
2018 Invited Talk: Reproducible, Reusable, and Robust Reinforcement Learning »
Joelle Pineau -
2017 : Invited Talk - Joelle Pineau »
Joelle Pineau -
2017 Demonstration: A Deep Reinforcement Learning Chatbot »
Iulian Vlad Serban · Chinnadhurai Sankar · Mathieu Germain · Saizheng Zhang · Zhouhan Lin · Sandeep Subramanian · Taesup Kim · Michael Pieper · Sarath Chandar · Nan Rosemary Ke · Sai Rajeswar Mudumba · Alexandre de Brébisson · Jose Sotelo · Dendi A Suhubdy · Vincent Michalski · Joelle Pineau · Yoshua Bengio -
2017 Poster: Multitask Spectral Learning of Weighted Automata »
Guillaume Rabusseau · Borja Balle · Joelle Pineau -
2017 Poster: Hybrid Reward Architecture for Reinforcement Learning »
Harm Van Seijen · Mehdi Fatemi · Romain Laroche · Joshua Romoff · Tavian Barnes · Jeffrey Tsang -
2016 : Joelle Pineau »
Joelle Pineau -
2015 Poster: Policy Evaluation Using the Ω-Return »
Philip Thomas · Scott Niekum · Georgios Theocharous · George Konidaris -
2014 Workshop: From Bad Models to Good Policies (Sequential Decision Making under Uncertainty) »
Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor · Jeremie Mary · Laurent Orseau · Thomas Dietterich · Ronald Ortner · Peter Grünwald · Joelle Pineau · Raphael Fonteneau · Georgios Theocharous · Esteban D Arcaute · Christos Dimitrakakis · Nan Jiang · Doina Precup · Pierre-Luc Bacon · Marek Petrik · Aviv Tamar -
2014 Workshop: Autonomously Learning Robots »
Gerhard Neumann · Joelle Pineau · Peter Auer · Marc Toussaint -
2014 Demonstration: SmartWheeler – A smart robotic wheelchair platform »
Martin Gerdzhev · Joelle Pineau · Angus Leigh · Andrew Sutcliffe -
2013 Poster: Projected Natural Actor-Critic »
Philip Thomas · William C Dabney · Stephen Giguere · Sridhar Mahadevan -
2013 Poster: Learning from Limited Demonstrations »
Beomjoon Kim · Amir-massoud Farahmand · Joelle Pineau · Doina Precup -
2013 Poster: Bellman Error Based Feature Generation using Random Projections on Sparse Spaces »
Mahdi Milani Fard · Yuri Grinberg · Amir-massoud Farahmand · Joelle Pineau · Doina Precup -
2013 Spotlight: Learning from Limited Demonstrations »
Beomjoon Kim · Amir-massoud Farahmand · Joelle Pineau · Doina Precup -
2012 Poster: On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization »
Andre S Barreto · Doina Precup · Joelle Pineau -
2011 Session: Oral Session 10 »
Joelle Pineau -
2011 Poster: TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning »
George Konidaris · Scott Niekum · Philip Thomas -
2011 Poster: Policy Gradient Coagent Networks »
Philip Thomas -
2011 Poster: Reinforcement Learning using Kernel-Based Stochastic Factorization »
Andre S Barreto · Doina Precup · Joelle Pineau -
2010 Workshop: Learning and Planning from Batch Time Series Data »
Daniel Lizotte · Michael Bowling · Susan Murphy · Joelle Pineau · Sandeep Vijan -
2010 Poster: PAC-Bayesian Model Selection for Reinforcement Learning »
Mahdi Milani Fard · Joelle Pineau -
2009 Poster: Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability »
Keith Bush · Joelle Pineau -
2008 Poster: MDPs with Non-Deterministic Policies »
Mahdi Milani Fard · Joelle Pineau -
2007 Spotlight: Bayes-Adaptive POMDPs »
Stephane Ross · Brahim Chaib-draa · Joelle Pineau -
2007 Poster: Bayes-Adaptive POMDPs »
Stephane Ross · Brahim Chaib-draa · Joelle Pineau -
2007 Poster: Theoretical Analysis of Heuristic Search Methods for Online POMDPs »
Stephane Ross · Joelle Pineau · Brahim Chaib-draa