Timezone: »
A major challenge in reinforcement learning is to determine which state-action pairs are responsible for future rewards that are delayed. Reward redistribution serves as a solution to re-assign credits for each time step from observed sequences. While the majority of current approaches construct the reward redistribution in an uninterpretable manner, we propose to explicitly model the contributions of state and action from a causal perspective, resulting in an interpretable reward redistribution and preserving policy invariance. In this paper, we start by studying the role of causal generative models in reward redistribution by characterizing the generation of Markovian rewards and trajectory-wise long-term return and further propose a framework, called Generative Return Decomposition (GRD), for policy optimization in delayed reward scenarios. Specifically, GRD first identifies the unobservable Markovian rewards and causal relations in the generative process. Then, GRD makes use of the identified causal generative model to form a compact representation to train policy over the most favorable subspace of the state space of the agent. Theoretically, we show that the unobservable Markovian reward function is identifiable, as well as the underlying causal structure and causal models. Experimental results show that our method outperforms state-of-the-art methods and the provided visualization further demonstrates the interpretability of our method.The project page is located at https://reedzyd.github.io/GenerativeReturnDecomposition/.
Author Information
Yudi Zhang (Eindhoven University of Technology)
Yali Du (King's College London)
Biwei Huang (University of California, San Diego)
Ziyan Wang (King's College Lonodn)
Jun Wang (UCL)
Meng Fang (Tencent)
Mykola Pechenizkiy (TU Eindhoven)
More from the Same Authors
-
2021 : MHER: Model-based Hindsight Experience Replay »
Yang Rui · Meng Fang · Lei Han · Yali Du · Feng Luo · Xiu Li -
2022 Poster: Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning »
Runze Liu · Fengshuo Bai · Yali Du · Yaodong Yang -
2022 Poster: Multiagent Q-learning with Sub-Team Coordination »
Wenhan Huang · Kai Li · Kun Shao · Tianze Zhou · Matthew Taylor · Jun Luo · Dongge Wang · Hangyu Mao · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Poster: M2N: Mesh Movement Networks for PDE Solvers »
Wenbin Song · Mingrui Zhang · Joseph G Wallwork · Junpeng Gao · Zheng Tian · Fanglei Sun · Matthew Piggott · Junqing Chen · Zuoqiang Shi · Xiang Chen · Jun Wang -
2022 : An Empirical Evaluation of Posterior Sampling for Constrained Reinforcement Learning »
Danil Provodin · Pratik Gajane · Mykola Pechenizkiy · Maurits Kaptein -
2022 : An Empirical Evaluation of Posterior Sampling for Constrained Reinforcement Learning »
Danil Provodin · Pratik Gajane · Mykola Pechenizkiy · Maurits Kaptein -
2022 : Contextual Transformer for Offline Meta Reinforcement Learning »
Runji Lin · Ye Li · Xidong Feng · Zhaowei Zhang · XIAN HONG WU FUNG · Haifeng Zhang · Jun Wang · Yali Du · Yaodong Yang -
2022 : Constrained MDPs can be Solved by Eearly-Termination with Recurrent Models »
Hao Sun · Ziping Xu · Meng Fang · Zhenghao Peng · Taiyi Wang · Bolei Zhou -
2022 : Supervised Q-Learning can be a Strong Baseline for Continuous Control »
Hao Sun · Ziping Xu · Taiyi Wang · Meng Fang · Bolei Zhou -
2022 : Supervised Q-Learning for Continuous Control »
Hao Sun · Ziping Xu · Taiyi Wang · Meng Fang · Bolei Zhou -
2022 : MOPA: a Minimalist Off-Policy Approach to Safe-RL »
Hao Sun · Ziping Xu · Zhenghao Peng · Meng Fang · Bo Dai · Bolei Zhou -
2023 : Zero-shot Cross-task Preference Alignment for Offline RL via Optimal Transport »
Runze Liu · Yali Du · Fengshuo Bai · Jiafei Lyu · Xiu Li -
2023 : GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models »
Mianchu Wang · Rui Yang · Xi Chen · Meng Fang -
2023 : Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training »
Xidong Feng · Ziyu Wan · Muning Wen · Ying Wen · Weinan Zhang · Jun Wang -
2023 : Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training »
Xidong Feng · Ziyu Wan · Muning Wen · Ying Wen · Weinan Zhang · Jun Wang -
2023 Poster: Generator Identification for Linear SDEs with Additive and Multiplicative Noise »
Yuanyuan Wang · Xi Geng · Wei Huang · Biwei Huang · Mingming Gong -
2023 Poster: Lending Interaction Wings to Recommender Systems with Conversational Agents »
Jiarui Jin · Xianyu Chen · Fanghua Ye · Mengyue Yang · Yue Feng · Weinan Zhang · Yong Yu · Jun Wang -
2023 Poster: Reduced Policy Optimization for Continuous Control with Hard Constraints »
Shutong Ding · Jingya Wang · Yali Du · Ye Shi -
2023 Poster: Identification of Nonlinear Latent Hierarchical Models »
Lingjing Kong · Biwei Huang · Feng Xie · Eric Xing · Yuejie Chi · Kun Zhang -
2023 Poster: Invariant Learning via Probability of Sufficient and Necessary Causes »
Mengyue Yang · Yonggang Zhang · Zhen Fang · Yali Du · Furui Liu · Jean-Francois Ton · Jianhong Wang · Jun Wang -
2023 Poster: Dynamic Sparsity Is Channel-Level Sparsity Learner »
Lu Yin · Gen Li · Meng Fang · Li Shen · Tianjin Huang · Zhangyang "Atlas" Wang · Vlado Menkovski · Xiaolong Ma · Mykola Pechenizkiy · Shiwei Liu -
2023 Poster: An Efficient End-to-End Training Approach for Zero-Shot Human-AI Coordination »
Xue Yan · Jiaxian Guo · Xingzhou Lou · Jun Wang · Haifeng Zhang · Yali Du -
2023 Poster: Online PCA in Converging Self-consistent Field Equations »
Xihan Li · Xiang Chen · Rasul Tutunov · Haitham Bou Ammar · Lei Wang · Jun Wang -
2023 Poster: Learning World Models with Identifiable Factorization »
Yuren Liu · Biwei Huang · Zhengmao Zhu · Honglong Tian · Mingming Gong · Yang Yu · Kun Zhang -
2023 Poster: COOM: A Game Benchmark for Continual Reinforcement Learning »
Tristan Tomilin · Meng Fang · Yudi Zhang · Mykola Pechenizkiy -
2023 Poster: ChessGPT: Bridging Policy Learning and Language Modeling »
Xidong Feng · Yicheng Luo · Ziyan Wang · Hongrui Tang · Mengyue Yang · Kun Shao · David Mguni · Yali Du · Jun Wang -
2022 Spotlight: Lightning Talks 5A-3 »
Minting Pan · Xiang Chen · Wenhan Huang · Can Chang · Zhecheng Yuan · Jianzhun Shao · Yushi Cao · Peihao Chen · Ke Xue · Zhengrong Xue · Zhiqiang Lou · Xiangming Zhu · Lei Li · Zhiming Li · Kai Li · Jiacheng Xu · Dongyu Ji · Ni Mu · Kun Shao · Tianpei Yang · Kunyang Lin · Ningyu Zhang · Yunbo Wang · Lei Yuan · Bo Yuan · Hongchang Zhang · Jiajun Wu · Tianze Zhou · Xueqian Wang · Ling Pan · Yuhang Jiang · Xiaokang Yang · Xiaozhuan Liang · Hao Zhang · Weiwen Hu · Miqing Li · YAN ZHENG · Matthew Taylor · Huazhe Xu · Shumin Deng · Chao Qian · YI WU · Shuncheng He · Wenbing Huang · Chuanqi Tan · Zongzhang Zhang · Yang Gao · Jun Luo · Yi Li · Xiangyang Ji · Thomas Li · Mingkui Tan · Fei Huang · Yang Yu · Huazhe Xu · Dongge Wang · Jianye Hao · Chuang Gan · Yang Liu · Luo Si · Hangyu Mao · Huajun Chen · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Spotlight: Multiagent Q-learning with Sub-Team Coordination »
Wenhan Huang · Kai Li · Kun Shao · Tianze Zhou · Matthew Taylor · Jun Luo · Dongge Wang · Hangyu Mao · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Spotlight: Optimistic Tree Searches for Combinatorial Black-Box Optimization »
Cedric Malherbe · Antoine Grosnit · Rasul Tutunov · Haitham Bou Ammar · Jun Wang -
2022 Poster: Latent Hierarchical Causal Structure Discovery with Rank Constraints »
Biwei Huang · Charles Jia Han Low · Feng Xie · Clark Glymour · Kun Zhang -
2022 Poster: Dynamic Sparse Network for Time Series Classification: Learning What to “See” »
Qiao Xiao · Boqian Wu · Yu Zhang · Shiwei Liu · Mykola Pechenizkiy · Elena Mocanu · Decebal Constantin Mocanu -
2022 Poster: Optimistic Tree Searches for Combinatorial Black-Box Optimization »
Cedric Malherbe · Antoine Grosnit · Rasul Tutunov · Haitham Bou Ammar · Jun Wang -
2022 Poster: Where to Pay Attention in Sparse Training for Feature Selection? »
Ghada Sokar · Zahra Atashgahi · Mykola Pechenizkiy · Decebal Constantin Mocanu -
2022 Poster: Factored Adaptation for Non-Stationary Reinforcement Learning »
Fan Feng · Biwei Huang · Kun Zhang · Sara Magliacane -
2022 Poster: Enhancing Safe Exploration Using Safety State Augmentation »
Aivar Sootla · Alexander Cowen-Rivers · Jun Wang · Haitham Bou Ammar -
2022 Poster: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem »
Muning Wen · Jakub Kuba · Runji Lin · Weinan Zhang · Ying Wen · Jun Wang · Yaodong Yang -
2022 Poster: A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning »
Bo Liu · Xidong Feng · Jie Ren · Luo Mai · Rui Zhu · Haifeng Zhang · Jun Wang · Yaodong Yang -
2021 : The Impact of Batch Learning in Stochastic Bandits »
Danil Provodin · Pratik Gajane · Mykola Pechenizkiy · Maurits Kaptein -
2021 Poster: Sparse Training via Boosting Pruning Plasticity with Neuroregeneration »
Shiwei Liu · Tianlong Chen · Xiaohan Chen · Zahra Atashgahi · Lu Yin · Huanyu Kou · Li Shen · Mykola Pechenizkiy · Zhangyang Wang · Decebal Constantin Mocanu -
2020 Poster: Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games »
Yunqiu Xu · Meng Fang · Ling Chen · Yali Du · Joey Tianyi Zhou · Chengqi Zhang -
2019 : Poster Session »
Nathalie Baracaldo · Seth Neel · Tuyen Le · Dan Philps · Suheng Tao · Sotirios Chatzis · Toyo Suzumura · Wei Wang · WENHANG BAO · Solon Barocas · Manish Raghavan · Samuel Maina · Reginald Bryant · Kush Varshney · Skyler D. Speakman · Navdeep Gill · Nicholas Schmidt · Kevin Compher · Naveen Sundar Govindarajulu · Vivek Sharma · Praneeth Vepakomma · Tristan Swedish · Jayashree Kalpathy-Cramer · Ramesh Raskar · Shihao Zheng · Mykola Pechenizkiy · Marco Schreyer · Li Ling · Chirag Nagpal · Robert Tillman · Manuela Veloso · Hanjie Chen · Xintong Wang · Michael Wellman · Matthew van Adelsberg · Ben Wood · Hans Buehler · Mahmoud Mahfouz · Antonios Alexos · Megan Shearer · Antigoni Polychroniadou · Lucia Larise Stavarache · Dmitry Efimov · Johnston P Hall · Yukun Zhang · Emily Diana · Sumitra Ganesh · Vineeth Ravi · · Swetasudha Panda · Xavier Renard · Matthew Jagielski · Yonadav Shavit · Joshua Williams · Haoran Wei · Shuang (Sophie) Zhai · Xinyi Li · Hongda Shen · Daiki Matsunaga · Jaesik Choi · Alexis Laignelet · Batuhan Guler · Jacobo Roa Vicens · Ajit Desai · Jonathan Aigrain · Robert Samoilescu -
2019 Poster: Curriculum-guided Hindsight Experience Replay »
Meng Fang · Tianyi Zhou · Yali Du · Lei Han · Zhengyou Zhang -
2019 Poster: LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning »
Yali Du · Lei Han · Meng Fang · Ji Liu · Tianhong Dai · Dacheng Tao -
2018 : Posters and Open Discussions (see below for poster titles) »
Ramya Malur Srinivasan · Miguel Perez · Yuanyuan Liu · Ben Wood · Dan Philps · Kyle Brown · Daniel Martin · Mykola Pechenizkiy · Luca Costabello · Rongguang Wang · Suproteem Sarkar · Sangwoong Yoon · Zhuoran Xiong · Enguerrand Horel · Zhu (Drew) Zhang · Ulf Johansson · Jonathan Kochems · Gregory Sidier · Prashant Reddy · Lana Cuthbertson · Yvonne Wambui · Christelle Marfaing · Galen Harrison · Irene Unceta Mendieta · Thomas Kehler · Mark Weber · Li Ling · Ceena Modarres · Abhinav Dhall · Arash Nourian · David Byrd · Ajay Chander · Xiao-Yang Liu · Hongyang Yang · Shuang (Sophie) Zhai · Freddy Lecue · Sirui Yao · Rory McGrath · Artur Garcez · Vangelis Bacoyannis · Alexandre Garcia · Lukas Gonon · Mark Ibrahim · Melissa Louie · Omid Ardakanian · Cecilia Sönströd · Kojin Oshiba · Chaofan Chen · Suchen Jin · aldo pareja · Toyo Suzumura