Timezone: »
A great challenge in cooperative decentralized multi-agent reinforcement learning (MARL) is generating diversified behaviors for each individual agent when receiving only a team reward. Prior studies have paid much effort on reward shaping or designing a centralized critic that can discriminatively credit the agents. In this paper, we propose to merge the two directions and learn each agent an intrinsic reward function which diversely stimulates the agents at each time step. Specifically, the intrinsic reward for a specific agent will be involved in computing a distinct proxy critic for the agent to direct the updating of its individual policy. Meanwhile, the parameterized intrinsic reward function will be updated towards maximizing the expected accumulated team reward from the environment so that the objective is consistent with the original MARL problem. The proposed method is referred to as learning individual intrinsic reward (LIIR) in MARL. We compare LIIR with a number of state-of-the-art MARL methods on battle games in StarCraft II. The results demonstrate the effectiveness of LIIR, and we show LIIR can assign each individual agent an insightful intrinsic reward per time step.
Author Information
Yali Du (University College London)
I am currently a research fellow at UCL. I am interested in multi-agent reinforcement learning, adversarial machine learning and recommendation systems.
Lei Han (Tencent AI Lab)
Meng Fang (Tencent)
Ji Liu (Kwai Inc.)
Tianhong Dai (Imperial College London)
Dacheng Tao (University of Sydney)
More from the Same Authors
-
2021 : MHER: Model-based Hindsight Experience Replay »
Yang Rui · Meng Fang · Lei Han · Yali Du · Feng Luo · Xiu Li -
2022 Poster: RORL: Robust Offline Reinforcement Learning via Conservative Smoothing »
Rui Yang · Chenjia Bai · Xiaoteng Ma · Zhaoran Wang · Chongjie Zhang · Lei Han -
2022 : Constrained MDPs can be Solved by Eearly-Termination with Recurrent Models »
Hao Sun · Ziping Xu · Meng Fang · Zhenghao Peng · Taiyi Wang · Bolei Zhou -
2022 : Supervised Q-Learning can be a Strong Baseline for Continuous Control »
Hao Sun · Ziping Xu · Taiyi Wang · Meng Fang · Bolei Zhou -
2022 : Supervised Q-Learning for Continuous Control »
Hao Sun · Ziping Xu · Taiyi Wang · Meng Fang · Bolei Zhou -
2022 : MOPA: a Minimalist Off-Policy Approach to Safe-RL »
Hao Sun · Ziping Xu · Zhenghao Peng · Meng Fang · Bo Dai · Bolei Zhou -
2023 Poster: GRD: A Generative Approach for Interpretable Reward Redistribution in Reinforcement Learning »
Yudi Zhang · Yali Du · Biwei Huang · Ziyan Wang · Jun Wang · Meng Fang · Mykola Pechenizkiy -
2023 Poster: Dynamic Sparsity Is Channel-Level Sparsity Learner »
Lu Yin · Gen Li · Meng Fang · Li Shen · Tianjin Huang · Zhangyang Wang · Vlado Menkovski · Xiaolong Ma · Mykola Pechenizkiy · Shiwei Liu -
2023 Poster: MeGraph: Capturing Long-Range Interactions by Alternating Local and Hierarchical Aggregation on Multi-Scaled Graph Hierarchy »
Honghua Dong · Jiawei Xu · Yu Yang · Rui Zhao · Shiwen Wu · Chun Yuan · Xiu Li · Chris Maddison · Lei Han -
2023 Poster: COOM: A Game Benchmark for Continual Reinforcement Learning »
Tristan Tomilin · Meng Fang · Yudi Zhang · Mykola Pechenizkiy -
2022 Spotlight: RORL: Robust Offline Reinforcement Learning via Conservative Smoothing »
Rui Yang · Chenjia Bai · Xiaoteng Ma · Zhaoran Wang · Chongjie Zhang · Lei Han -
2022 Spotlight: Lightning Talks 5A-1 »
Yao Mu · Jin Zhang · Haoyi Niu · Rui Yang · Mingdong Wu · Ze Gong · Shubham Sharma · Chenjia Bai · Yu ("Tony") Zhang · Siyuan Li · Yuzheng Zhuang · Fangwei Zhong · Yiwen Qiu · Xiaoteng Ma · Fei Ni · Yulong Xia · Chongjie Zhang · Hao Dong · Ming Li · Zhaoran Wang · Bin Wang · Chongjie Zhang · Jianyu Chen · Guyue Zhou · Lei Han · Jianming HU · Jianye Hao · Xianyuan Zhan · Ping Luo -
2022 Poster: Improving Certified Robustness via Statistical Learning with Logical Reasoning »
Zhuolin Yang · Zhikuan Zhao · Boxin Wang · Jiawei Zhang · Linyi Li · Hengzhi Pei · Bojan Karlaš · Ji Liu · Heng Guo · Ce Zhang · Bo Li -
2022 Poster: Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping »
Hao Sun · Lei Han · Rui Yang · Xiaoteng Ma · Jian Guo · Bolei Zhou -
2021 Poster: ErrorCompensatedX: error compensation for variance reduced algorithms »
Hanlin Tang · Yao Li · Ji Liu · Ming Yan -
2021 Poster: TNASP: A Transformer-based NAS Predictor with a Self-evolution Framework »
Shun Lu · Jixiang Li · Jianchao Tan · Sen Yang · Ji Liu -
2021 Poster: Shifted Chunk Transformer for Spatio-Temporal Representational Learning »
Xuefan Zha · Wentao Zhu · Lv Xun · Sen Yang · Ji Liu -
2021 Poster: Dynamic Bottleneck for Robust Self-Supervised Exploration »
Chenjia Bai · Lingxiao Wang · Lei Han · Animesh Garg · Jianye Hao · Peng Liu · Zhaoran Wang -
2020 Poster: SCOP: Scientific Control for Reliable Neural Network Pruning »
Yehui Tang · Yunhe Wang · Yixing Xu · Dacheng Tao · Chunjing XU · Chao Xu · Chang Xu -
2020 Poster: Part-dependent Label Noise: Towards Instance-dependent Label Noise »
Xiaobo Xia · Tongliang Liu · Bo Han · Nannan Wang · Mingming Gong · Haifeng Liu · Gang Niu · Dacheng Tao · Masashi Sugiyama -
2020 Poster: Auto Learning Attention »
Benteng Ma · Jing Zhang · Yong Xia · Dacheng Tao -
2020 Spotlight: Part-dependent Label Noise: Towards Instance-dependent Label Noise »
Xiaobo Xia · Tongliang Liu · Bo Han · Nannan Wang · Mingming Gong · Haifeng Liu · Gang Niu · Dacheng Tao · Masashi Sugiyama -
2020 Poster: Searching for Low-Bit Weights in Quantized Neural Networks »
Zhaohui Yang · Yunhe Wang · Kai Han · Chunjing XU · Chao Xu · Dacheng Tao · Chang Xu -
2020 Poster: Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning »
Huan Fu · Shunming Li · Rongfei Jia · Mingming Gong · Binqiang Zhao · Dacheng Tao -
2020 Poster: Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free »
Haotao Wang · Tianlong Chen · Shupeng Gui · TingKuei Hu · Ji Liu · Zhangyang Wang -
2020 Poster: Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games »
Yunqiu Xu · Meng Fang · Ling Chen · Yali Du · Joey Tianyi Zhou · Chengqi Zhang -
2020 Poster: Video Frame Interpolation without Temporal Priors »
Youjian Zhang · Chaoyue Wang · Dacheng Tao -
2020 Poster: Domain Generalization via Entropy Regularization »
Shanshan Zhao · Mingming Gong · Tongliang Liu · Huan Fu · Dacheng Tao -
2019 Poster: Theoretical Analysis of Adversarial Learning: A Minimax Approach »
Zhuozhuo Tu · Jingwei Zhang · Dacheng Tao -
2019 Spotlight: Theoretical Analysis of Adversarial Learning: A Minimax Approach »
Zhuozhuo Tu · Jingwei Zhang · Dacheng Tao -
2019 Poster: Efficient Smooth Non-Convex Stochastic Compositional Optimization via Stochastic Recursive Gradient Descent »
Wenqing Hu · Chris Junchi Li · Xiangru Lian · Ji Liu · Angela Yuan -
2019 Poster: Curriculum-guided Hindsight Experience Replay »
Meng Fang · Tianyi Zhou · Yali Du · Lei Han · Zhengyou Zhang -
2019 Poster: Category Anchor-Guided Unsupervised Domain Adaptation for Semantic Segmentation »
Qiming ZHANG · Jing Zhang · Wei Liu · Dacheng Tao -
2019 Poster: Global Sparse Momentum SGD for Pruning Very Deep Neural Networks »
Xiaohan Ding · guiguang ding · Xiangxin Zhou · Yuchen Guo · Jungong Han · Ji Liu -
2019 Poster: Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge »
Tingting Qiao · Jing Zhang · Duanqing Xu · Dacheng Tao -
2019 Poster: Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence »
Fengxiang He · Tongliang Liu · Dacheng Tao -
2019 Poster: Model Compression with Adversarial Robustness: A Unified Optimization Framework »
Shupeng Gui · Haotao Wang · Haichuan Yang · Chen Yu · Zhangyang Wang · Ji Liu -
2019 Poster: Positive-Unlabeled Compression on the Cloud »
Yixing Xu · Yunhe Wang · Hanting Chen · Kai Han · Chunjing XU · Dacheng Tao · Chang Xu -
2019 Poster: Learning from Bad Data via Generation »
Tianyu Guo · Chang Xu · Boxin Shi · Chao Xu · Dacheng Tao -
2019 Poster: Likelihood-Free Overcomplete ICA and Applications In Causal Discovery »
Chenwei DING · Mingming Gong · Kun Zhang · Dacheng Tao -
2019 Spotlight: Likelihood-Free Overcomplete ICA and Applications In Causal Discovery »
Chenwei DING · Mingming Gong · Kun Zhang · Dacheng Tao -
2018 Poster: Dual Swap Disentangling »
Zunlei Feng · Xinchao Wang · Chenglong Ke · An-Xiang Zeng · Dacheng Tao · Mingli Song -
2018 Poster: Communication Compression for Decentralized Training »
Hanlin Tang · Shaoduo Gan · Ce Zhang · Tong Zhang · Ji Liu -
2018 Poster: Stochastic Primal-Dual Method for Empirical Risk Minimization with O(1) Per-Iteration Complexity »
Conghui Tan · Tong Zhang · Shiqian Ma · Ji Liu -
2018 Poster: Gradient Sparsification for Communication-Efficient Distributed Optimization »
Jianqiao Wangni · Jialei Wang · Ji Liu · Tong Zhang -
2017 Poster: Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent »
Xiangru Lian · Ce Zhang · Huan Zhang · Cho-Jui Hsieh · Wei Zhang · Ji Liu -
2017 Oral: Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent »
Xiangru Lian · Ce Zhang · Huan Zhang · Cho-Jui Hsieh · Wei Zhang · Ji Liu -
2016 Poster: Asynchronous Parallel Greedy Coordinate Descent »
Yang You · Xiangru Lian · Ji Liu · Hsiang-Fu Yu · Inderjit Dhillon · James Demmel · Cho-Jui Hsieh -
2016 Poster: Accelerating Stochastic Composition Optimization »
Mengdi Wang · Ji Liu · Ethan Fang -
2016 Poster: A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order »
Xiangru Lian · Huan Zhang · Cho-Jui Hsieh · Yijun Huang · Ji Liu -
2015 Poster: Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization »
Xiangru Lian · Yijun Huang · Yuncheng Li · Ji Liu -
2015 Spotlight: Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization »
Xiangru Lian · Yijun Huang · Yuncheng Li · Ji Liu -
2014 Poster: Exclusive Feature Learning on Arbitrary Structures via $\ell_{1,2}$-norm »
Deguang Kong · Ryohei Fujimaki · Ji Liu · Feiping Nie · Chris Ding -
2013 Poster: An Approximate, Efficient LP Solver for LP Rounding »
Srikrishna Sridhar · Stephen Wright · Christopher Re · Ji Liu · Victor Bittorf · Ce Zhang -
2012 Poster: Regularized Off-Policy TD-Learning »
Bo Liu · Sridhar Mahadevan · Ji Liu -
2012 Spotlight: Regularized Off-Policy TD-Learning »
Bo Liu · Sridhar Mahadevan · Ji Liu -
2010 Poster: Multi-Stage Dantzig Selector »
Ji Liu · Peter Wonka · Jieping Ye