Timezone: »
Off-policy learning, referring to the procedure of policy optimization with access only to logged feedback data, has shown importance in various important real-world applications, such as search engines and recommender systems. While the ground-truth logging policy is usually unknown, previous work simply takes its estimated value for the off-policy learning, ignoring the negative impact from both high bias and high variance resulted from such an estimator. And these impact is often magnified on samples with small and inaccurately estimated logging probabilities. The contribution of this work is to explicitly model the uncertainty in the estimated logging policy, and propose an Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning, with a theoretical convergence guarantee. Experiment results on the synthetic and real-world recommendation datasets demonstrate that UIPS significantly improves the quality of the discovered policy, when compared against an extensive list of state-of-the-art baselines.
Author Information
Xiaoying Zhang (ByteDance Research)
Junpu Chen (ChongQing University)
Hongning Wang (Tsinghua University)
Hong Xie (Chongqing Univeristy)
Yang Liu (UC Santa Cruz/ByteDance Research)
John C.S. Lui (Chinese University of Hong Kong)
Hang Li (Bytedance Technology)
More from the Same Authors
-
2021 Spotlight: Unintended Selection: Persistent Qualification Rate Disparities and Interventions »
Reilly Raab · Yang Liu -
2021 : Unfairness Despite Awareness: Group-Fair Classification with Strategic Agents »
Andrew Estornell · Sanmay Das · Yang Liu · Yevgeniy Vorobeychik -
2021 : Unfairness Despite Awareness: Group-Fair Classification with Strategic Agents »
Andrew Estornell · Sanmay Das · Yang Liu · Yevgeniy Vorobeychik -
2022 : Tier Balancing: Towards Dynamic Fairness over Underlying Causal Factors »
Zeyu Tang · Yatong Chen · Yang Liu · Kun Zhang -
2022 : Fast Implicit Constrained Optimization of Non-decomposable Objectives for Deep Networks »
Yatong Chen · Abhishek Kumar · Yang Liu · Ehsan Amid -
2022 : Spectrum Guided Topology Augmentation for Graph Contrastive Learning »
Lu Lin · Jinghui Chen · Hongning Wang -
2023 : Procedural Fairness Through Decoupling Objectionable Data Generating Components »
Zeyu Tang · Jialu Wang · Yang Liu · Peter Spirtes · Kun Zhang -
2023 : Transparency Through the Lens of Recourse and Manipulation »
Yatong Chen · Andrew Estornell · Yevgeniy Vorobeychik · Yang Liu -
2023 : Large Language Model Unlearning »
Yuanshun (Kevin) Yao · Xiaojun Xu · Yang Liu -
2023 : Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment »
Yang Liu · Yuanshun (Kevin) Yao · Jean-Francois Ton · Xiaoying Zhang · Ruocheng Guo · Hao Cheng · Yegor Klochkov · Muhammad Faaiz Taufiq · Hang Li -
2023 : Procedural Fairness Through Decoupling Objectionable Data Generating Components »
Zeyu Tang · Jialu Wang · Yang Liu · Peter Spirtes · Kun Zhang -
2023 : Transparency Through the Lens of Recourse and Manipulation »
Yatong Chen · Andrew Estornell · Yevgeniy Vorobeychik · Yang Liu -
2023 Poster: Rethinking Incentives in Recommender Systems: Are Monotone Rewards Always Beneficial? »
Fan Yao · Chuanhao Li · Karthik Abinav Sankararaman · Yiming Liao · Yan Zhu · Qifan Wang · Hongning Wang · Haifeng Xu -
2023 Poster: Online Corrupted User Detection and Regret Minimization »
Zhiyong Wang · Jize Xie · Tong Yu · Shuai Li · John C.S. Lui -
2023 Poster: Multi-Objective Intrinsic Reward Learning for Conversational Recommender Systems »
Zhendong Chu · Nan Wang · Hongning Wang -
2023 Poster: Multi-Fidelity Multi-Armed Bandits Revisited »
Xuchuang Wang · Qingyun Wu · Wei Chen · John C.S. Lui -
2023 Poster: Block Broyden's Methods for Solving Nonlinear Equations »
Chengchang Liu · Cheng Chen · Luo Luo · John C.S. Lui -
2023 Poster: Online Clustering of Bandits with Misspecified User Models »
Zhiyong Wang · Jize Xie · Xutong Liu · Shuai Li · John C.S. Lui -
2023 Poster: Long-Term Fairness with Unknown Dynamics »
Tongxin Yin · Reilly Raab · Mingyan Liu · Yang Liu -
2023 Poster: Incentivized Communication for Federated Bandits »
Zhepei Wei · Chuanhao Li · Haifeng Xu · Hongning Wang -
2023 Poster: Model Sparsity Can Simplify Machine Unlearning »
jinghan jia · Jiancheng Liu · Parikshit Ram · Yuguang Yao · Gaowen Liu · Yang Liu · PRANAY SHARMA · Sijia Liu -
2022 Spotlight: Certifying Some Distributional Fairness with Subpopulation Decomposition »
Mintong Kang · Linyi Li · Maurice Weber · Yang Liu · Ce Zhang · Bo Li -
2022 Poster: Communication Efficient Distributed Learning for Kernelized Contextual Bandits »
Chuanhao Li · Huazheng Wang · Mengdi Wang · Hongning Wang -
2022 Poster: Fairness Transferability Subject to Bounded Distribution Shift »
Yatong Chen · Reilly Raab · Jialu Wang · Yang Liu -
2022 Poster: Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms »
Xutong Liu · Jinhang Zuo · Siwei Wang · Carlee Joe-Wong · John C.S. Lui · Wei Chen -
2022 Poster: Certifying Some Distributional Fairness with Subpopulation Decomposition »
Mintong Kang · Linyi Li · Maurice Weber · Yang Liu · Ce Zhang · Bo Li -
2022 Poster: Adaptive Data Debiasing through Bounded Exploration »
Yifan Yang · Yang Liu · Parinaz Naghizadeh -
2022 Poster: Communication Efficient Federated Learning for Generalized Linear Bandits »
Chuanhao Li · Hongning Wang -
2021 : Revisiting Dynamics in Strategic ML »
Yang Liu -
2021 : Bounded Fairness Transferability subject to Distribution Shift »
Reilly Raab · Yatong Chen · Yang Liu -
2021 Poster: Unintended Selection: Persistent Qualification Rate Disparities and Interventions »
Reilly Raab · Yang Liu -
2021 Poster: Can Less be More? When Increasing-to-Balancing Label Noise Rates Considered Beneficial »
Yang Liu · Jialu Wang -
2021 Poster: Policy Learning Using Weak Supervision »
Jingkang Wang · Hongyi Guo · Zhaowei Zhu · Yang Liu -
2021 Poster: Bandit Learning with Delayed Impact of Actions »
Wei Tang · Chien-Ju Ho · Yang Liu -
2020 : Contributed Talk 4: Strategic Recourse in Linear Classification »
Yatong Chen · Yang Liu -
2020 Poster: Learning Strategy-Aware Linear Classifiers »
Yiling Chen · Yang Liu · Chara Podimata -
2020 Poster: How do fair decisions fare in long-term qualification? »
Xueru Zhang · Ruibo Tu · Yang Liu · Mingyan Liu · Hedvig Kjellstrom · Kun Zhang · Cheng Zhang -
2020 Poster: Optimal Query Complexity of Secure Stochastic Convex Optimization »
Wei Tang · Chien-Ju Ho · Yang Liu -
2019 Poster: Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation »
Xueying Bai · Jian Guan · Hongning Wang -
2018 Poster: Bandit Learning with Implicit Feedback »
Yi Qi · Qingyun Wu · Hongning Wang · Jie Tang · Maosong Sun