Timezone: »
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning. By exploiting historical transitions, a policy is trained to maximize a learned value function while constrained by the behavior policy to avoid a significant distributional shift. In this paper, we propose our closed-form policy improvement (CFPI) operators. We make a novel observation that the behavior constraint naturally motivates the use of first-order Taylor approximation, leading to a linear approximation of the policy objective. Additionally, as practical datasets are usually collected by heterogeneous policies, we model the behavior policies as a Gaussian Mixture and overcome the induced optimization difficulties by leveraging the LogSumExp's lower bound and Jensen's Inequality, giving rise to our CFPI operators. We instantiate an offline RL algorithm with our novel policy improvement operator and empirically demonstrate its effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
Author Information
Jiachen Li (University of California, Santa Barbara)
Jiachen Li is a second-year Ph.D. student at UC Santa Barbara working with Prof. William Wang. I received my M.S. degree in Electrical and Computer Engineering at UC San Diego, advised by Prof. Hao Su and Prof. Pengtao Xie, and my B.E. degree from Huazhong University of Science and Technology as an Outstanding Undergraduate in Terms of Academic Performance (Top 1%).
Edwin Zhang (UCSB)
Ming Yin (UC Santa Barbara)
Qinxun Bai (Horizon Robotics)
Yu-Xiang Wang (UC Santa Barbara)
William Yang Wang (University of California, Santa Barbara)
William Wang is the Co-Director of UC Santa Barbara's Natural Language Processing group and Center for Responsible Machine Learning. He is the Duncan and Suzanne Mellichamp Chair in Artificial Intelligence and Designs, and an Associate Professor in the Department of Computer Science at the University of California, Santa Barbara. He received his PhD from School of Computer Science, Carnegie Mellon University. He has broad interests in Artificial Intelligence, including statistical relational learning, information extraction, computational social science, dialog & generation, and vision. He has published more than 100 papers at leading NLP/AI/ML conferences and journals, and received best paper awards (or nominations) at ASRU 2013, CIKM 2013, EMNLP 2015, and CVPR 2019, a DARPA Young Faculty Award (Class of 2018), an IEEE AI's 10 to Watch Award (Class of 2020), an NSF CAREER Award (2021), two Google Faculty Research Awards (2018, 2019), three IBM Faculty Awards (2017-2019), two Facebook Research Awards (2018, 2019), an Amazon AWS Machine Learning Research Award, a JP Morgan Chase Faculty Research Award, an Adobe Research Award in 2018, and the Richard King Mellon Presidential Fellowship in 2011. He frequently serves as an Area Chair or Senior Area Chair for NAACL, ACL, EMNLP, and AAAI. He is an elected member of IEEE Speech and Language Processing Technical Committee (2021-2023) and a member of ACM Future of Computing Academy. In addition to research, William enjoys writing scientific articles that impact the broader online community. His work and opinions appear at major tech media outlets such as Wired, VICE, Scientific American, Fortune, Fast Company, NASDAQ, The Next Web, Law.com, and Mental Floss.
More from the Same Authors
-
2021 : VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation »
Linjie Li · Jie Lei · Zhe Gan · Licheng Yu · Yen-Chun Chen · Rohit Pillai · Yu Cheng · Luowei Zhou · Xin Wang · William Yang Wang · Tamara L Berg · Mohit Bansal · Jingjing Liu · Lijuan Wang · Zicheng Liu -
2021 : A Dataset for Answering Time-Sensitive Questions »
Wenhu Chen · Xinyi Wang · William Yang Wang -
2021 : Instance-dependent Offline Reinforcement Learning: From tabular RL to linear MDPs »
Ming Yin · Yu-Xiang Wang -
2022 : Generalized PTR: User-Friendly Recipes for Data-Adaptive Algorithms with Differential Privacy »
Rachel Redberg · Yuqing Zhu · Yu-Xiang Wang -
2022 : VOTING-BASED APPROACHES FOR DIFFERENTIALLY PRIVATE FEDERATED LEARNING »
Yuqing Zhu · Xiang Yu · Yi-Hsuan Tsai · Francesco Pittaluga · Masoud Faraki · Manmohan Chandraker · Yu-Xiang Wang -
2022 : LAD: Language Augmented Diffusion for Reinforcement Learning »
Edwin Zhang · Yujie Lu · William Yang Wang · Amy Zhang -
2022 : Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data »
Sunil Madhow · Dan Qiao · Yu-Xiang Wang -
2022 : Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation »
Dan Qiao · Yu-Xiang Wang -
2022 : Differentially Private Gradient Boosting on Linear Learners for Tabular Data »
Saeyoung Rho · Shuai Tang · Sergul Aydore · Michael Kearns · Aaron Roth · Yu-Xiang Wang · Steven Wu · Cedric Archambeau -
2022 : Differentially Private Bias-Term only Fine-tuning of Foundation Models »
Zhiqi Bu · Yu-Xiang Wang · Sheng Zha · George Karypis -
2022 : Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction »
Jiachen Li · Shuo Cheng · Zhenyu Liao · Huayan Wang · William Yang Wang · Qinxun Bai -
2023 Poster: Flexible Attention-Based Multi-Policy Fusion for Efficient Deep Reinforcement Learning »
Zih-Yun Chiu · Yi-Lin Tuan · William Yang Wang · Michael Yip -
2023 Poster: Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger »
Zhiqi Bu · Yu-Xiang Wang · Sheng Zha · George Karypis -
2023 Poster: Offline Reinforcement Learning with Differential Privacy »
Dan Qiao · Yu-Xiang Wang -
2023 Poster: LayoutGPT: Compositional Visual Planning and Generation with Large Language Models »
Weixi Feng · Wanrong Zhu · Tsu-Jui Fu · Varun Jampani · Arjun Akula · Xuehai He · S Basu · Xin Wang · William Yang Wang -
2023 Poster: Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation »
Lijing Kuang · Ming Yin · Mengdi Wang · Yu-Xiang Wang · Yian Ma -
2023 Poster: LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation »
Yujie Lu · Xianjun Yang · Xiujun Li · Xin Wang · William Yang Wang -
2023 Poster: Online Label Shift: Optimal Dynamic Regret meets Practical Algorithms »
Dheeraj Baby · Saurabh Garg · Tzu-Ching Yen · Sivaraman Balakrishnan · Zachary Lipton · Yu-Xiang Wang -
2023 Poster: ALGO: Synthesizing Algorithmic Programs with Generated Oracle Verifiers »
Kexun Zhang · Danqing Wang · Jingtao Xia · William Yang Wang · Lei Li -
2023 Poster: Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data »
Alon Albalak · Colin Raffel · William Yang Wang -
2023 Poster: Improving the Privacy and Practicality of Objective Perturbation for Differentially Private Linear Learners »
Rachel Redberg · Antti Koskela · Yu-Xiang Wang -
2023 Poster: A Privacy-Friendly Approach to Data Valuation »
Jiachen T. Wang · Yuqing Zhu · Yu-Xiang Wang · Ruoxi Jia · Prateek Mittal -
2023 Poster: Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning »
Xinyi Wang · Wanrong Zhu · Michael Saxon · Mark Steyvers · William Yang Wang -
2023 Poster: Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text »
Wanrong Zhu · Jack Hessel · Anas Awadalla · Samir Yitzhak Gadre · Jesse Dodge · Alex Fang · Youngjae Yu · Ludwig Schmidt · William Yang Wang · Yejin Choi -
2022 : Contributed Talk: Differentially Private Bias-Term only Fine-tuning of Foundation Models »
Zhiqi Bu · Yu-Xiang Wang · Sheng Zha · George Karypis -
2022 : Panel on Privacy and Security in Machine Learning Systems »
Graham Cormode · Borja Balle · Yu-Xiang Wang · Alejandro Saucedo · Neil Lawrence -
2022 : Practical differential privacy »
Yu-Xiang Wang · Fariba Yousefi -
2022 : Practical differential privacy »
Yu-Xiang Wang -
2022 Poster: SeqPATE: Differentially Private Text Generation via Knowledge Distillation »
Zhiliang Tian · Yingxiu Zhao · Ziyue Huang · Yu-Xiang Wang · Nevin L. Zhang · He He -
2022 Poster: Differentially Private Linear Sketches: Efficient Implementations and Applications »
Fuheng Zhao · Dan Qiao · Rachel Redberg · Divyakant Agrawal · Amr El Abbadi · Yu-Xiang Wang -
2022 Poster: Optimal Dynamic Regret in LQR Control »
Dheeraj Baby · Yu-Xiang Wang -
2022 Poster: Society of Agents: Regret Bounds of Concurrent Thompson Sampling »
Yan Chen · Perry Dong · Qinxun Bai · Maria Dimakopoulou · Wei Xu · Zhengyuan Zhou -
2021 Workshop: Privacy in Machine Learning (PriML) 2021 »
Yu-Xiang Wang · Borja Balle · Giovanni Cherubin · Kamalika Chaudhuri · Antti Honkela · Jonathan Lebensold · Casey Meehan · Mi Jung Park · Adrian Weller · Yuqing Zhu -
2021 Poster: Local Explanation of Dialogue Response Generation »
Yi-Lin Tuan · Connor Pryor · Wenhu Chen · Lise Getoor · William Yang Wang -
2021 Poster: Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings »
Ming Yin · Yu-Xiang Wang -
2021 Poster: Towards Instance-Optimal Offline Reinforcement Learning with Pessimism »
Ming Yin · Yu-Xiang Wang -
2021 Poster: Counterfactual Maximum Likelihood Estimation for Training Deep Networks »
Xinyi Wang · Wenhu Chen · Michael Saxon · William Yang Wang -
2021 Poster: Near-Optimal Offline Reinforcement Learning via Double Variance Reduction »
Ming Yin · Yu Bai · Yu-Xiang Wang -
2020 Workshop: Privacy Preserving Machine Learning - PriML and PPML Joint Edition »
Borja Balle · James Bell · AurĂ©lien Bellet · Kamalika Chaudhuri · Adria Gascon · Antti Honkela · Antti Koskela · Casey Meehan · Olga Ohrimenko · Mi Jung Park · Mariana Raykova · Mary Anne Smart · Yu-Xiang Wang · Adrian Weller