Timezone: »

 
Offline Reinforcement Learning with Closed-Form Policy Improvement Operators
Jiachen Li · Edwin Zhang · Ming Yin · Qinxun Bai · Yu-Xiang Wang · William Yang Wang
Event URL: https://openreview.net/forum?id=D_XoYFKG_n »

Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning. By exploiting historical transitions, a policy is trained to maximize a learned value function while constrained by the behavior policy to avoid a significant distributional shift. In this paper, we propose our closed-form policy improvement (CFPI) operators. We make a novel observation that the behavior constraint naturally motivates the use of first-order Taylor approximation, leading to a linear approximation of the policy objective. Additionally, as practical datasets are usually collected by heterogeneous policies, we model the behavior policies as a Gaussian Mixture and overcome the induced optimization difficulties by leveraging the LogSumExp's lower bound and Jensen's Inequality, giving rise to our CFPI operators. We instantiate an offline RL algorithm with our novel policy improvement operator and empirically demonstrate its effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.

Author Information

Jiachen Li (University of California, Santa Barbara)

Jiachen Li is a second-year Ph.D. student at UC Santa Barbara working with Prof. William Wang. I received my M.S. degree in Electrical and Computer Engineering at UC San Diego, advised by Prof. Hao Su and Prof. Pengtao Xie, and my B.E. degree from Huazhong University of Science and Technology as an Outstanding Undergraduate in Terms of Academic Performance (Top 1%).

Edwin Zhang (UCSB)
Ming Yin (UC Santa Barbara)
Qinxun Bai (Horizon Robotics)
Yu-Xiang Wang (UC Santa Barbara)
William Yang Wang (University of California, Santa Barbara)

William Wang is the Co-Director of UC Santa Barbara's Natural Language Processing group and Center for Responsible Machine Learning. He is the Duncan and Suzanne Mellichamp Chair in Artificial Intelligence and Designs, and an Associate Professor in the Department of Computer Science at the University of California, Santa Barbara. He received his PhD from School of Computer Science, Carnegie Mellon University. He has broad interests in Artificial Intelligence, including statistical relational learning, information extraction, computational social science, dialog & generation, and vision. He has published more than 100 papers at leading NLP/AI/ML conferences and journals, and received best paper awards (or nominations) at ASRU 2013, CIKM 2013, EMNLP 2015, and CVPR 2019, a DARPA Young Faculty Award (Class of 2018), an IEEE AI's 10 to Watch Award (Class of 2020), an NSF CAREER Award (2021), two Google Faculty Research Awards (2018, 2019), three IBM Faculty Awards (2017-2019), two Facebook Research Awards (2018, 2019), an Amazon AWS Machine Learning Research Award, a JP Morgan Chase Faculty Research Award, an Adobe Research Award in 2018, and the Richard King Mellon Presidential Fellowship in 2011. He frequently serves as an Area Chair or Senior Area Chair for NAACL, ACL, EMNLP, and AAAI. He is an elected member of IEEE Speech and Language Processing Technical Committee (2021-2023) and a member of ACM Future of Computing Academy. In addition to research, William enjoys writing scientific articles that impact the broader online community. His work and opinions appear at major tech media outlets such as Wired, VICE, Scientific American, Fortune, Fast Company, NASDAQ, The Next Web, Law.com, and Mental Floss.

More from the Same Authors