Timezone: »
This paper studies the problem of Anytime-Competitive Markov Decision Process (A-CMDP). Existing works on Constrained Markov Decision Processes (CMDPs) aim to optimize the expected reward while constraining the expected cost over random dynamics, but the cost in a specific episode can still be unsatisfactorily high. In contrast, the goal of A-CMDP is to optimize the expected reward while guaranteeing a bounded cost in each round of any episode against a policy prior. We propose a new algorithm, called Anytime-Competitive Reinforcement Learning (ACRL), which provably guarantees the anytime cost constraints. The regret analysis shows the policy asymptotically matches the optimal reward achievable under the anytime competitive constraints. Experiments on the application of carbon-intelligent computing verify the reward performance and cost constraint guarantee of ACRL.
Author Information
Jianyi Yang (University of California, Riverside)
Pengfei Li
Tongxin Li (The Chinese University of Hong Kong (Shenzhen))
Adam Wierman (Caltech)
Shaolei Ren (University of California, Riverside)
More from the Same Authors
-
2021 Spotlight: Perturbation-based Regret Analysis of Predictive Control in Linear Time Varying Systems »
Yiheng Lin · Yang Hu · Guanya Shi · Haoyuan Sun · Guannan Qu · Adam Wierman -
2022 : Robustifying machine-learned algorithms for efficient grid operation »
Nicolas Christianson · Christopher Yeh · Tongxin Li · Mahdi Torabi Rad · Azarang Golmohammadi · Adam Wierman -
2022 : Stability Constrained Reinforcement Learning for Real-Time Voltage Control »
Jie Feng · Yuanyuan Shi · Guannan Qu · Steven Low · Anima Anandkumar · Adam Wierman -
2022 : SustainGym: A Benchmark Suite of Reinforcement Learning for Sustainability Applications »
Christopher Yeh · Victor Li · Rajeev Datta · Yisong Yue · Adam Wierman -
2023 Poster: A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games »
Zaiwei Chen · Kaiqing Zhang · Eric Mazumdar · Asuman Ozdaglar · Adam Wierman -
2023 Poster: Online Adaptive Policy Selection in Time-Varying Systems: No-Regret via Contractive Perturbations »
Yiheng Lin · James A. Preiss · Emile Anand · Yingying Li · Yisong Yue · Adam Wierman -
2023 Poster: Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions »
Tongxin Li · Yiheng Lin · Shaolei Ren · Adam Wierman -
2023 Poster: Robust Learning for Smoothed Online Convex Optimization with Feedback Delay »
Pengfei Li · Jianyi Yang · Adam Wierman · Shaolei Ren -
2023 Poster: Adversarial Attacks on Online Learning to Rank with Click Feedback »
Jinhang Zuo · Zhiyao Zhang · Zhiyong Wang · Shuai Li · Mohammad Hajiesmaili · Adam Wierman -
2023 Poster: SustainGym: Reinforcement Learning Environments for Sustainable Energy Systems »
Christopher Yeh · Victor Li · Rajeev Datta · Julio Arroyo · Nicolas Christianson · Chi Zhang · Yize Chen · Mohammad Mehdi Hosseini · Azarang Golmohammadi · Yuanyuan Shi · Yisong Yue · Adam Wierman -
2022 Poster: On the Sample Complexity of Stabilizing LTI Systems on a Single Trajectory »
Yang Hu · Adam Wierman · Guannan Qu -
2022 Poster: Bounded-Regret MPC via Perturbation Analysis: Prediction Error, Constraints, and Nonlinearity »
Yiheng Lin · Yang Hu · Guannan Qu · Tongxin Li · Adam Wierman -
2021 Poster: Multi-Agent Reinforcement Learning in Stochastic Networked Systems »
Yiheng Lin · Guannan Qu · Longbo Huang · Adam Wierman -
2021 Poster: Pareto-Optimal Learning-Augmented Algorithms for Online Conversion Problems »
Bo Sun · Russell Lee · Mohammad Hajiesmaili · Adam Wierman · Danny Tsang -
2021 Poster: Perturbation-based Regret Analysis of Predictive Control in Linear Time Varying Systems »
Yiheng Lin · Yang Hu · Guanya Shi · Haoyuan Sun · Guannan Qu · Adam Wierman -
2020 Poster: Online Optimization with Memory and Competitive Control »
Guanya Shi · Yiheng Lin · Soon-Jo Chung · Yisong Yue · Adam Wierman -
2020 Poster: Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward »
Guannan Qu · Yiheng Lin · Adam Wierman · Na Li -
2020 Poster: The Power of Predictions in Online Control »
Chenkai Yu · Guanya Shi · Soon-Jo Chung · Yisong Yue · Adam Wierman -
2019 Poster: Beyond Online Balanced Descent: An Optimal Algorithm for Smoothed Online Optimization »
Gautam Goel · Yiheng Lin · Haoyuan Sun · Adam Wierman -
2019 Spotlight: Beyond Online Balanced Descent: An Optimal Algorithm for Smoothed Online Optimization »
Gautam Goel · Yiheng Lin · Haoyuan Sun · Adam Wierman