Timezone: »
Offline reinforcement learning is used to train policies in situations where it is expensive or infeasible to access the environment during training. An agent trained under such a scenario does not get corrective feedback once the learned policy starts diverging and may fall prey to the overestimation bias commonly seen in this setting. This increases the chances of the agent choosing unsafe/risky actions, especially in states with sparse to no representation in the training dataset. In this paper, we propose to leverage a safety expert to discourage the offline RL agent from choosing unsafe actions in under-represented states in the dataset. The proposed framework in this paper transfers the safety expert's knowledge in an offline setting for states with high uncertainty to prevent catastrophic failures from occurring in safety-critical domains. We use a simple but effective approach to quantify the state uncertainty based on how frequently they appear in a training dataset. In states with high uncertainty, the offline RL agent mimics the safety expert while maximizing the long-term reward. We modify TD3+BC, an existing offline RL algorithm, as a part of the proposed approach. We demonstrate empirically that our approach performs better than TD3+BC on some control tasks and comparably on others across two sets of benchmark datasets while reducing the chance of taking unsafe actions in sparse regions of the state space.
Author Information
Richa Verma (TCS Research)
Kartik Bharadwaj (Indian Institute of Technology, Madras)
I am a MS CS student at IITM. I am interested in Safe RL, and MARL.
Harshad Khadilkar (Tata Consultancy Services Ltd)
Scientist with TCS Research and Visiting Associate Professor at IIT Bombay. Educational background includes a bachelors in engineering from IIT Bombay, followed by a masters and a PhD from MIT (2013).
Balaraman Ravindran (Indian Institute of Technology Madras)
More from the Same Authors
-
2021 : Deep RePReL--Combining Planning and Deep RL for acting in relational domains »
Harsha Kokel · Arjun Manoharan · Sriraam Natarajan · Balaraman Ravindran · Prasad Tadepalli -
2021 : Interactive Robust Policy Optimization for Multi-Agent Reinforcement Learning »
Videh Nema · Balaraman Ravindran -
2021 : Interactive Robust Policy Optimization for Multi-Agent Reinforcement Learning »
Videh Nema · Balaraman Ravindran -
2021 : Interactive Robust Policy Optimization for Multi-Agent Reinforcement Learning »
Videh Nema · Balaraman Ravindran -
2022 : Reinforcement Learning for Cost to Serve »
Pranavi Pathakota · Kunwar Zaid · Hardik Meisheri · Harshad Khadilkar -
2022 : Dual Channel Training of Large Action Spaces in Reinforcement Learning »
Pranavi Pathakota · Hardik Meisheri · Harshad Khadilkar -
2022 : Lagrangian Model Based Reinforcement Learning »
Adithya Ramesh · Balaraman Ravindran -
2021 : Matching options to tasks using Option-Indexed Hierarchical Reinforcement Learning »
Kushal Chauhan · Soumya Chatterjee · Pradeep Shenoy · Balaraman Ravindran -
2019 : Coffee Break & Poster Session 2 »
Juho Lee · Yoonho Lee · Yee Whye Teh · Raymond A. Yeh · Yuan-Ting Hu · Alex Schwing · Sara Ahmadian · Alessandro Epasto · Marina Knittel · Ravi Kumar · Mohammad Mahdian · Christian Bueno · Aditya Sanghi · Pradeep Kumar Jayaraman · Ignacio Arroyo-Fernández · Andrew Hryniowski · Vinayak Mathur · Sanjay Singh · Shahrzad Haddadan · Vasco Portilheiro · Luna Zhang · Mert Yuksekgonul · Jhosimar Arias Figueroa · Deepak Maurya · Balaraman Ravindran · Frank NIELSEN · Philip Pham · Justin Payan · Andrew McCallum · Jinesh Mehta · Ke SUN -
2018 : Spotlights 2 »
Mausam · Ankit Anand · Parag Singla · Tarik Koc · Tim Klinger · Habibeh Naderi · Sungwon Lyu · Saeed Amizadeh · Kshitij Dwivedi · Songpeng Zu · Wei Feng · Balaraman Ravindran · Edouard Pineau · Abdulkadir Celikkanat · Deepak Venugopal -
2014 Poster: An Autoencoder Approach to Learning Bilingual Word Representations »
Sarath Chandar · Stanislas Lauly · Hugo Larochelle · Mitesh Khapra · Balaraman Ravindran · Vikas C Raykar · Amrita Saha