Timezone: »
Most existing policy learning solutions require the learning agents to receive high-quality supervision signals, e.g., rewards in reinforcement learning (RL) or high-quality expert demonstrations in behavioral cloning (BC). These quality supervisions are either infeasible or prohibitively expensive to obtain in practice. We aim for a unified framework that leverages the available cheap weak supervisions to perform policy learning efficiently. To handle this problem, we treat the weak supervision'' as imperfect information coming from a peer agent, and evaluate the learning agent's policy based on a correlated agreement'' with the peer agent's policy (instead of simple agreements). Our approach explicitly punishes a policy for overfitting to the weak supervision. In addition to theoretical guarantees, extensive evaluations on tasks including RL with noisy reward, BC with weak demonstrations, and standard policy co-training (RL + BC) show that our method leads to substantial performance improvements, especially when the complexity or the noise of the learning environments is high.
Author Information
Jingkang Wang (Uber ATG, University of Toronto)
Hongyi Guo (Shanghai Jiao Tong University)
Zhaowei Zhu (UC Santa Cruz)
Yang Liu (UC Santa Cruz)
More from the Same Authors
-
2021 Spotlight: Unintended Selection: Persistent Qualification Rate Disparities and Interventions »
Reilly Raab · Yang Liu -
2021 : Unfairness Despite Awareness: Group-Fair Classification with Strategic Agents »
Andrew Estornell · Sanmay Das · Yang Liu · Yevgeniy Vorobeychik -
2021 : Unfairness Despite Awareness: Group-Fair Classification with Strategic Agents »
Andrew Estornell · Sanmay Das · Yang Liu · Yevgeniy Vorobeychik -
2022 : Tier Balancing: Towards Dynamic Fairness over Underlying Causal Factors »
Zeyu Tang · Yatong Chen · Yang Liu · Kun Zhang -
2022 : Fast Implicit Constrained Optimization of Non-decomposable Objectives for Deep Networks »
Yatong Chen · Abhishek Kumar · Yang Liu · Ehsan Amid -
2023 Poster: Long-Term Fairness with Unknown Dynamics »
Tongxin Yin · Reilly Raab · Mingyan Liu · Yang Liu -
2023 Poster: Uncertainty-Aware Instance Reweighting for Off-Policy Learning »
Xiaoying Zhang · Junpu Chen · Hongning Wang · Hong Xie · Yang Liu · John C.S. Lui · Hang Li -
2023 Poster: Model Sparsity Can Simplify Machine Unlearning »
jinghan jia · Jiancheng Liu · Parikshit Ram · Yuguang Yao · Gaowen Liu · Yang Liu · PRANAY SHARMA · Sijia Liu -
2022 Spotlight: Certifying Some Distributional Fairness with Subpopulation Decomposition »
Mintong Kang · Linyi Li · Maurice Weber · Yang Liu · Ce Zhang · Bo Li -
2022 Poster: Fairness Transferability Subject to Bounded Distribution Shift »
Yatong Chen · Reilly Raab · Jialu Wang · Yang Liu -
2022 Poster: Certifying Some Distributional Fairness with Subpopulation Decomposition »
Mintong Kang · Linyi Li · Maurice Weber · Yang Liu · Ce Zhang · Bo Li -
2022 Poster: Adaptive Data Debiasing through Bounded Exploration »
Yifan Yang · Yang Liu · Parinaz Naghizadeh -
2021 : Revisiting Dynamics in Strategic ML »
Yang Liu -
2021 : Bounded Fairness Transferability subject to Distribution Shift »
Reilly Raab · Yatong Chen · Yang Liu -
2021 Poster: Unintended Selection: Persistent Qualification Rate Disparities and Interventions »
Reilly Raab · Yang Liu -
2021 Poster: Can Less be More? When Increasing-to-Balancing Label Noise Rates Considered Beneficial »
Yang Liu · Jialu Wang -
2021 Poster: Adversarial Attack Generation Empowered by Min-Max Optimization »
Jingkang Wang · Tianyun Zhang · Sijia Liu · Pin-Yu Chen · Jiacen Xu · Makan Fardad · Bo Li -
2021 Poster: Bandit Learning with Delayed Impact of Actions »
Wei Tang · Chien-Ju Ho · Yang Liu -
2020 : Contributed Talk 4: Strategic Recourse in Linear Classification »
Yatong Chen · Yang Liu -
2020 Poster: Learning Strategy-Aware Linear Classifiers »
Yiling Chen · Yang Liu · Chara Podimata -
2020 Poster: How do fair decisions fare in long-term qualification? »
Xueru Zhang · Ruibo Tu · Yang Liu · Mingyan Liu · Hedvig Kjellstrom · Kun Zhang · Cheng Zhang -
2020 Poster: Optimal Query Complexity of Secure Stochastic Convex Optimization »
Wei Tang · Chien-Ju Ho · Yang Liu