Timezone: »
Safety is one of the crucial concerns for the real-world application of reinforcement learning (RL). Previous works consider the safe exploration problem as Constrained Markov Decision Process (CMDP), where the policies are being optimized under constraints. However, when encountering any potential danger, human tends to stop immediately and rarely learns to behave safely in danger. Moreover, the off-policy learning nature of humans guarantees high learning efficiency in risky tasks. Motivated by human learning, we introduce a Minimalist Off-Policy Approach (MOPA) to address Safe-RL problem. We first define the Early Terminated MDP (ET-MDP) as a special type of MDPs that has the same optimal value function as its CMDP counterpart. An off-policy learning algorithm MOPA based on recurrent models is then proposed to solve the ET-MDP, which thereby solves the corresponding CMDP. Experiments on various Safe-RL tasks show a substantial improvement over previous methods that directly solve CMDP, in terms of higher asymptotic performance and better learning efficiency.
Author Information
Hao Sun (University of Cambridge)
Ziping Xu (University of Michigan)
My name is Ziping Xu. I am a fifth-year Ph.D. student in Statistics at the University of Michigan. My research interests are on sample efficient reinforcement learning and transfer learning, multitask learning. I am looking for research-orientated full-time job starting Fall 2023
Zhenghao Peng (University of California, Los Angeles)
Meng Fang (Tencent)
Bo Dai (Shanghai AI Lab)
Bolei Zhou (UCLA)

Assistant professor at UCLA's computer science department
More from the Same Authors
-
2021 Spotlight: Representation Learning Beyond Linear Prediction Functions »
Ziping Xu · Ambuj Tewari -
2021 : MHER: Model-based Hindsight Experience Replay »
Yang Rui · Meng Fang · Lei Han · Yali Du · Feng Luo · Xiu Li -
2022 : ChemSpacE: Interpretable and Interactive Chemical Space Exploration »
Yuanqi Du · Xian Liu · Nilay Shah · Shengchao Liu · Jieyu Zhang · Bolei Zhou -
2022 : Constrained MDPs can be Solved by Eearly-Termination with Recurrent Models »
Hao Sun · Ziping Xu · Meng Fang · Zhenghao Peng · Taiyi Wang · Bolei Zhou -
2022 : Supervised Q-Learning can be a Strong Baseline for Continuous Control »
Hao Sun · Ziping Xu · Taiyi Wang · Meng Fang · Bolei Zhou -
2022 : GraphCG: Unsupervised Discovery of Steerable Factors in Graphs »
Shengchao Liu · Chengpeng Wang · Weili Nie · Hanchen Wang · Jiarui Lu · Bolei Zhou · Jian Tang -
2022 : Supervised Q-Learning for Continuous Control »
Hao Sun · Ziping Xu · Taiyi Wang · Meng Fang · Bolei Zhou -
2022 : Novel Policy Seeking with Constrained Optimization »
Hao Sun · Zhenghao Peng · Bolei Zhou -
2022 : Toward Causal-Aware RL: State-Wise Action-Refined Temporal Difference »
Hao Sun · Taiyi Wang -
2023 Poster: GRD: A Generative Approach for Interpretable Reward Redistribution in Reinforcement Learning »
Yudi Zhang · Yali Du · Biwei Huang · Ziyan Wang · Jun Wang · Meng Fang · Mykola Pechenizkiy -
2023 Poster: Learning from Active Human Involvement through Proxy Value Propagation »
Zhenghao Peng · Wenjie Mo · Chenda Duan · Quanyi Li · Bolei Zhou -
2023 Poster: Learning Modulated Transformation in GANs »
Ceyuan Yang · Qihang Zhang · Yinghao Xu · Jiapeng Zhu · Yujun Shen · Bo Dai -
2023 Poster: Dynamic Sparsity Is Channel-Level Sparsity Learner »
Lu Yin · Gen Li · Meng Fang · Li Shen · Tianjin Huang · Zhangyang Wang · Vlado Menkovski · Xiaolong Ma · Mykola Pechenizkiy · Shiwei Liu -
2022 : Factor Investing with a Deep Multi-Factor Model »
Zikai Wei · Bo Dai · Dahua Lin -
2022 : Toward Generalizable Embodied AI for Autonomous Driving »
Bolei Zhou -
2022 Poster: Human-AI Shared Control via Policy Dissection »
Quanyi Li · Zhenghao Peng · Haibin Wu · Lan Feng · Bolei Zhou -
2022 Poster: Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping »
Hao Sun · Lei Han · Rui Yang · Xiaoteng Ma · Jian Guo · Bolei Zhou -
2022 Poster: Improving GANs with A Dynamic Discriminator »
Ceyuan Yang · Yujun Shen · Yinghao Xu · Deli Zhao · Bo Dai · Bolei Zhou -
2022 Poster: Adaptive Sampling for Discovery »
Ziping Xu · Eunjae Shim · Ambuj Tewari · Paul Zimmerman -
2021 Poster: A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis »
Xingang Pan · Xudong XU · Chen Change Loy · Christian Theobalt · Bo Dai -
2021 Poster: Representation Learning Beyond Linear Prediction Functions »
Ziping Xu · Ambuj Tewari -
2021 Poster: Generative Occupancy Fields for 3D Surface-Aware Image Synthesis »
Xudong XU · Xingang Pan · Dahua Lin · Bo Dai -
2021 Poster: Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data »
Liming Jiang · Bo Dai · Wayne Wu · Chen Change Loy -
2020 Poster: TorsionNet: A Reinforcement Learning Approach to Sequential Conformer Search »
Tarun Gogineni · Ziping Xu · Exequiel Punzalan · Runxuan Jiang · Joshua Kammeraad · Ambuj Tewari · Paul Zimmerman -
2020 Poster: Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games »
Yunqiu Xu · Meng Fang · Ling Chen · Yali Du · Joey Tianyi Zhou · Chengqi Zhang -
2020 Poster: Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting »
Ziping Xu · Ambuj Tewari -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 : Poster Session »
Ahana Ghosh · Javad Shafiee · Akhilan Boopathy · Alex Tamkin · Theodoros Vasiloudis · Vedant Nanda · Ali Baheri · Paul Fieguth · Andrew Bennett · Guanya Shi · Hao Liu · Arushi Jain · Jacob Tyo · Benjie Wang · Boxiao Chen · Carroll Wainwright · Chandramouli Shama Sastry · Chao Tang · Daniel S. Brown · David Inouye · David Venuto · Dhruv Ramani · Dimitrios Diochnos · Divyam Madaan · Dmitrii Krashenikov · Joel Oren · Doyup Lee · Eleanor Quint · elmira amirloo · Matteo Pirotta · Gavin Hartnett · Geoffroy Dubourg-Felonneau · Gokul Swamy · Pin-Yu Chen · Ilija Bogunovic · Jason Carter · Javier Garcia-Barcos · Jeet Mohapatra · Jesse Zhang · Jian Qian · John Martin · Oliver Richter · Federico Zaiter · Tsui-Wei Weng · Karthik Abinav Sankararaman · Kyriakos Polymenakos · Lan Hoang · mahdieh abbasi · Marco Gallieri · Mathieu Seurin · Matteo Papini · Matteo Turchetta · Matthew Sotoudeh · Mehrdad Hosseinzadeh · Nathan Fulton · Masatoshi Uehara · Niranjani Prasad · Oana-Maria Camburu · Patrik Kolaric · Philipp Renz · Prateek Jaiswal · Reazul Hasan Russel · Riashat Islam · Rishabh Agarwal · Alexander Aldrick · Sachin Vernekar · Sahin Lale · Sai Kiran Narayanaswami · Samuel Daulton · Sanjam Garg · Sebastian East · Shun Zhang · Soheil Dsidbari · Justin Goodwin · Victoria Krakovna · Wenhao Luo · Wesley Chung · Yuanyuan Shi · Yuh-Shyang Wang · Hongwei Jin · Ziping Xu -
2019 Poster: Curriculum-guided Hindsight Experience Replay »
Meng Fang · Tianyi Zhou · Yali Du · Lei Han · Zhengyou Zhang -
2019 Poster: Policy Continuation with Hindsight Inverse Dynamics »
Hao Sun · Zhizhong Li · Xiaotong Liu · Bolei Zhou · Dahua Lin -
2019 Spotlight: Policy Continuation with Hindsight Inverse Dynamics »
Hao Sun · Zhizhong Li · Xiaotong Liu · Bolei Zhou · Dahua Lin -
2019 Poster: LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning »
Yali Du · Lei Han · Meng Fang · Ji Liu · Tianhong Dai · Dacheng Tao -
2018 Poster: A Neural Compositional Paradigm for Image Captioning »
Bo Dai · Sanja Fidler · Dahua Lin -
2017 Poster: Contrastive Learning for Image Captioning »
Bo Dai · Dahua Lin