Timezone: »
Policy constraint methods to offline reinforcement learning (RL) typically utilize parameterization or regularization that constrains the policy to perform actions within the support set of the behavior policy. The elaborative designs of parameterization methods usually intrude into the policy networks, which may bring extra inference cost and cannot take full advantage of well-established online methods. Regularization methods reduce the divergence between the learned policy and the behavior policy, which may mismatch the inherent density-based definition of support set thereby failing to avoid the out-of-distribution actions effectively. This paper presents Supported Policy OpTimization (SPOT), which is directly derived from the theoretical formalization of the density-based support constraint. SPOT adopts a VAE-based density estimator to explicitly model the support set of behavior policy and presents a simple but effective density-based regularization term, which can be plugged non-intrusively into off-the-shelf off-policy RL algorithms. SPOT achieves the state-of-the-art performance on standard benchmarks for offline RL. Benefiting from the pluggable design, offline pretrained models from SPOT can also be applied to perform online fine-tuning seamlessly.
Author Information
Jialong Wu (School of Software, Tsinghua University)
Haixu Wu (Tsinghua University)
Zihan Qiu (IIIS, Tsinghua University)
Jianmin Wang (Tsinghua University)
Mingsheng Long (Tsinghua University)
More from the Same Authors
-
2022 Poster: Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models »
Yang Shu · Zhangjie Cao · Ziyang Zhang · Jianmin Wang · Mingsheng Long -
2022 Poster: Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting »
Yong Liu · Haixu Wu · Jianmin Wang · Mingsheng Long -
2022 : Domain Adaptation: Theory, Algorithms, and Open Library »
Mingsheng Long -
2022 Poster: Debiased Self-Training for Semi-Supervised Learning »
Baixu Chen · Junguang Jiang · Ximei Wang · Pengfei Wan · Jianmin Wang · Mingsheng Long -
2021 Poster: Cycle Self-Training for Domain Adaptation »
Hong Liu · Jianmin Wang · Mingsheng Long -
2021 Poster: Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting »
Haixu Wu · Jiehui Xu · Jianmin Wang · Mingsheng Long -
2020 Poster: Co-Tuning for Transfer Learning »
Kaichao You · Zhi Kou · Mingsheng Long · Jianmin Wang -
2020 Poster: Transferable Calibration with Lower Bias and Variance in Domain Adaptation »
Ximei Wang · Mingsheng Long · Jianmin Wang · Michael Jordan -
2020 Poster: Stochastic Normalization »
Zhi Kou · Kaichao You · Mingsheng Long · Jianmin Wang -
2020 Poster: Learning to Adapt to Evolving Domains »
Hong Liu · Mingsheng Long · Jianmin Wang · Yu Wang -
2019 Poster: Catastrophic Forgetting Meets Negative Transfer: Batch Spectral Shrinkage for Safe Transfer Learning »
Xinyang Chen · Sinan Wang · Bo Fu · Mingsheng Long · Jianmin Wang -
2019 Poster: Transferable Normalization: Towards Improving Transferability of Deep Neural Networks »
Ximei Wang · Ying Jin · Mingsheng Long · Jianmin Wang · Michael Jordan -
2018 Poster: Conditional Adversarial Domain Adaptation »
Mingsheng Long · ZHANGJIE CAO · Jianmin Wang · Michael Jordan -
2018 Poster: Generalized Zero-Shot Learning with Deep Calibration Network »
Shichen Liu · Mingsheng Long · Jianmin Wang · Michael Jordan -
2017 Poster: PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs »
Yunbo Wang · Mingsheng Long · Jianmin Wang · Zhifeng Gao · Philip S Yu -
2017 Poster: Learning Multiple Tasks with Multilinear Relationship Networks »
Mingsheng Long · ZHANGJIE CAO · Jianmin Wang · Philip S Yu -
2016 Poster: Unsupervised Domain Adaptation with Residual Transfer Networks »
Mingsheng Long · Han Zhu · Jianmin Wang · Michael Jordan -
2015 Workshop: Transfer and Multi-Task Learning: Trends and New Perspectives »
Anastasia Pentina · Christoph Lampert · Sinno Jialin Pan · Mingsheng Long · Judy Hoffman · Baochen Sun · Kate Saenko