Timezone: »
We study multi-agent reinforcement learning (MARL) in a stochastic network of agents. The objective is to find localized policies that maximize the (discounted) global reward. In general, scalability is a challenge in this setting because the size of the global state/action space can be exponential in the number of agents. Scalable algorithms are only known in cases where dependencies are static, fixed and local, e.g., between neighbors in a fixed, time-invariant underlying graph. In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies can be non-local and stochastic, and provide a finite-time error bound that shows how the convergence rate depends on the speed of information spread in the network. Additionally, as a byproduct of our analysis, we obtain novel finite-time convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation, which apply beyond the setting of MARL in networked systems.
Author Information
Yiheng Lin (California Institute of Technology)
Guannan Qu (California Institute of Technology)
Longbo Huang (IIIS, Tsinghua Univeristy)
Adam Wierman (Caltech)
More from the Same Authors
-
2021 Spotlight: Perturbation-based Regret Analysis of Predictive Control in Linear Time Varying Systems »
Yiheng Lin · Yang Hu · Guanya Shi · Haoyuan Sun · Guannan Qu · Adam Wierman -
2021 : Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification »
Ling Pan · Longbo Huang · Tengyu Ma · Huazhe Xu -
2022 Poster: Provable Generalization of Overparameterized Meta-learning Trained with SGD »
Yu Huang · Yingbin Liang · Longbo Huang -
2022 : Why (and When) does Local SGD Generalize Better than SGD? »
Xinran Gu · Kaifeng Lyu · Longbo Huang · Sanjeev Arora -
2022 : Online Min-max Optimization: Nonconvexity, Nonstationarity, and Dynamic Regret »
Yu Huang · Yuan Cheng · Yingbin Liang · Longbo Huang -
2022 : Robustifying machine-learned algorithms for efficient grid operation »
Nicolas Christianson · Christopher Yeh · Tongxin Li · Mahdi Torabi Rad · Azarang Golmohammadi · Adam Wierman -
2022 : Stability Constrained Reinforcement Learning for Real-Time Voltage Control »
Jie Feng · Yuanyuan Shi · Guannan Qu · Steven Low · Anima Anandkumar · Adam Wierman -
2022 : SustainGym: A Benchmark Suite of Reinforcement Learning for Sustainability Applications »
Christopher Yeh · Victor Li · Rajeev Datta · Yisong Yue · Adam Wierman -
2022 Spotlight: Lightning Talks 3B-2 »
Yu Huang · Tero Karras · Maxim Kodryan · Shiau Hong Lim · Shudong Huang · Ziyu Wang · Siqiao Xue · ILYAS MALIK · Ekaterina Lobacheva · Miika Aittala · Hongjie Wu · Yuhao Zhou · Yingbin Liang · Xiaoming Shi · Jun Zhu · Maksim Nakhodnov · Timo Aila · Yazhou Ren · James Zhang · Longbo Huang · Dmitry Vetrov · Ivor Tsang · Hongyuan Mei · Samuli Laine · Zenglin Xu · Wentao Feng · Jiancheng Lv -
2022 Spotlight: Provable Generalization of Overparameterized Meta-learning Trained with SGD »
Yu Huang · Yingbin Liang · Longbo Huang -
2022 Poster: On the Sample Complexity of Stabilizing LTI Systems on a Single Trajectory »
Yang Hu · Adam Wierman · Guannan Qu -
2022 Poster: Bounded-Regret MPC via Perturbation Analysis: Prediction Error, Constraints, and Nonlinearity »
Yiheng Lin · Yang Hu · Guannan Qu · Tongxin Li · Adam Wierman -
2021 Poster: The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition »
Tiancheng Jin · Longbo Huang · Haipeng Luo -
2021 Poster: Continuous Mean-Covariance Bandits »
Yihan Du · Siwei Wang · Zhixuan Fang · Longbo Huang -
2021 Poster: Fast Federated Learning in the Presence of Arbitrary Device Unavailability »
Xinran Gu · Kaixuan Huang · Jingzhao Zhang · Longbo Huang -
2021 Poster: What Makes Multi-Modal Learning Better than Single (Provably) »
Yu Huang · Chenzhuang Du · Zihui Xue · Xuanyao Chen · Hang Zhao · Longbo Huang -
2021 Poster: Regularized Softmax Deep Multi-Agent Q-Learning »
Ling Pan · Tabish Rashid · Bei Peng · Longbo Huang · Shimon Whiteson -
2021 Poster: Pareto-Optimal Learning-Augmented Algorithms for Online Conversion Problems »
Bo Sun · Russell Lee · Mohammad Hajiesmaili · Adam Wierman · Danny Tsang -
2021 Poster: Perturbation-based Regret Analysis of Predictive Control in Linear Time Varying Systems »
Yiheng Lin · Yang Hu · Guanya Shi · Haoyuan Sun · Guannan Qu · Adam Wierman -
2021 Oral: The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition »
Tiancheng Jin · Longbo Huang · Haipeng Luo -
2020 Poster: Online Optimization with Memory and Competitive Control »
Guanya Shi · Yiheng Lin · Soon-Jo Chung · Yisong Yue · Adam Wierman -
2020 Poster: Softmax Deep Double Deterministic Policy Gradients »
Ling Pan · Qingpeng Cai · Longbo Huang -
2020 Poster: Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward »
Guannan Qu · Yiheng Lin · Adam Wierman · Na Li -
2020 Poster: Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits »
Siwei Wang · Longbo Huang · John C. S. Lui -
2020 Poster: The Power of Predictions in Online Control »
Chenkai Yu · Guanya Shi · Soon-Jo Chung · Yisong Yue · Adam Wierman -
2019 Poster: Beyond Online Balanced Descent: An Optimal Algorithm for Smoothed Online Optimization »
Gautam Goel · Yiheng Lin · Haoyuan Sun · Adam Wierman -
2019 Spotlight: Beyond Online Balanced Descent: An Optimal Algorithm for Smoothed Online Optimization »
Gautam Goel · Yiheng Lin · Haoyuan Sun · Adam Wierman -
2019 Poster: Double Quantization for Communication-Efficient Distributed Optimization »
Yue Yu · Jiaxiang Wu · Longbo Huang -
2018 Poster: Multi-armed Bandits with Compensation »
Siwei Wang · Longbo Huang