Timezone: »
Poster
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg
The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observed in delay. This paper studies online learning in episodic Markov decision process (MDP) with unknown transitions, adversarially changing costs, and unrestricted delayed bandit feedback. More precisely, the feedback for the agent in episode $k$ is revealed only in the end of episode $k + d^k$, where the delay $d^k$ can be changing over episodes and chosen by an oblivious adversary. We present the first algorithms that achieve near-optimal $\sqrt{K + D}$ regret, where $K$ is the number of episodes and $D = \sum_{k=1}^K d^k$ is the total delay, significantly improving upon the best known regret bound of $(K + D)^{2/3}$.
Author Information
Tiancheng Jin (University of Southern California)
Tal Lancewicki (Tel Aviv University)
Haipeng Luo (University of Southern California)
Yishay Mansour (Tel Aviv University & Google)
Aviv Rosenberg (Amazon)
I am an Applied Scientist at Amazon Alexa Shopping, Tel Aviv. Previously, I obtained my PhD from the department of computer science at Tel Aviv University, where I was fortunate to have Prof. Yishay Mansour as my advisor. Prior to that, I received my Bachelor's degree in Mathematics and Computer Science from Tel Aviv University. My primary research interest lies in theoretical and applied machine learning. More specifically, my PhD focused on data-driven sequential decision making such as reinforcement learning, online learning and multi-armed bandit.
More from the Same Authors
-
2021 Spotlight: Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations »
Ayush Sekhari · Christoph Dann · Mehryar Mohri · Yishay Mansour · Karthik Sridharan -
2022 : Clairvoyant Regret Minimization: Equivalence with Nemirovski’s Conceptual Prox Method and Extension to General Convex Games »
Gabriele Farina · Christian Kroer · Chung-Wei Lee · Haipeng Luo -
2022 : A Theory of Learning with Competing Objectives and User Feedback »
Pranjal Awasthi · Corinna Cortes · Yishay Mansour · Mehryar Mohri -
2022 : A Theory of Learning with Competing Objectives and User Feedback »
Pranjal Awasthi · Corinna Cortes · Yishay Mansour · Mehryar Mohri -
2022 : Finding Safe Zones of Markov Decision Processes Policies »
Michal Moshkovitz · Lee Cohen · Yishay Mansour -
2023 Poster: Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms »
Tiancheng Jin · Junyan Liu · Haipeng Luo -
2023 Poster: Regret Matching$^+$: (In)Stability and Fast Convergence in Games »
Gabriele Farina · Julien Grand-Clément · Christian Kroer · Chung-Wei Lee · Haipeng Luo -
2023 Poster: Practical Contextual Bandits with Feedback Graphs »
Mengxiao Zhang · Yuheng Zhang · Olga Vrousgou · Haipeng Luo · Paul Mineiro -
2023 Poster: Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games »
Yang Cai · Haipeng Luo · Chen-Yu Wei · Weiqiang Zheng -
2023 Poster: No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions »
William Chang · Tiancheng Jin · Junyan Liu · Haipeng Luo · Chloé Rouyer · Chen-Yu Wei -
2022 Spotlight: Lightning Talks 4A-2 »
Barakeel Fanseu Kamhoua · Hualin Zhang · Taiki Miyagawa · Tomoya Murata · Xin Lyu · Yan Dai · Elena Grigorescu · Zhipeng Tu · Lijun Zhang · Taiji Suzuki · Wei Jiang · Haipeng Luo · Lin Zhang · Xi Wang · Young-San Lin · Huan Xiong · Liyu Chen · Bin Gu · Jinfeng Yi · Yongqiang Chen · Sandeep Silwal · Yiguang Hong · Maoyuan Song · Lei Wang · Tianbao Yang · Han Yang · MA Kaili · Samson Zhou · Deming Yuan · Bo Han · Guodong Shi · Bo Li · James Cheng -
2022 Spotlight: Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback »
Yan Dai · Haipeng Luo · Liyu Chen -
2022 : A Theory of Learning with Competing Objectives and User Feedback »
Pranjal Awasthi · Corinna Cortes · Yishay Mansour · Mehryar Mohri -
2022 Poster: Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments »
Liyu Chen · Haipeng Luo -
2022 Poster: Benign Underfitting of Stochastic Gradient Descent »
Tomer Koren · Roi Livni · Yishay Mansour · Uri Sherman -
2022 Poster: A Characterization of Semi-Supervised Adversarially Robust PAC Learnability »
Idan Attias · Steve Hanneke · Yishay Mansour -
2022 Poster: Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games »
Ioannis Anagnostides · Gabriele Farina · Christian Kroer · Chung-Wei Lee · Haipeng Luo · Tuomas Sandholm -
2022 Poster: Fair Wrapping for Black-box Predictions »
Alexander Soen · Ibrahim Alabdulmohsin · Sanmi Koyejo · Yishay Mansour · Nyalleng Moorosi · Richard Nock · Ke Sun · Lexing Xie -
2022 Poster: Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback »
Yan Dai · Haipeng Luo · Liyu Chen -
2022 Poster: Near-Optimal No-Regret Learning Dynamics for General Convex Games »
Gabriele Farina · Ioannis Anagnostides · Haipeng Luo · Chung-Wei Lee · Christian Kroer · Tuomas Sandholm -
2021 Poster: Minimax Regret for Stochastic Shortest Path »
Alon Cohen · Yonathan Efroni · Yishay Mansour · Aviv Rosenberg -
2021 Oral: Optimal Rates for Random Order Online Optimization »
Uri Sherman · Tomer Koren · Yishay Mansour -
2021 Poster: The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition »
Tiancheng Jin · Longbo Huang · Haipeng Luo -
2021 Poster: Last-iterate Convergence in Extensive-Form Games »
Chung-Wei Lee · Christian Kroer · Haipeng Luo -
2021 Poster: Optimal Rates for Random Order Online Optimization »
Uri Sherman · Tomer Koren · Yishay Mansour -
2021 Poster: Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure »
Aviv Rosenberg · Yishay Mansour -
2021 Poster: Differentially Private Multi-Armed Bandits in the Shuffle Model »
Jay Tenenbaum · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2021 Poster: ROI Maximization in Stochastic Online Decision-Making »
Nicolò Cesa-Bianchi · Tom Cesari · Yishay Mansour · Vianney Perchet -
2021 Poster: Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path »
Liyu Chen · Mehdi Jafarnia-Jahromi · Rahul Jain · Haipeng Luo -
2021 Poster: Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations »
Ayush Sekhari · Christoph Dann · Mehryar Mohri · Yishay Mansour · Karthik Sridharan -
2021 Poster: Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses »
Haipeng Luo · Chen-Yu Wei · Chung-Wei Lee -
2021 Poster: Dueling Bandits with Team Comparisons »
Lee Cohen · Ulrike Schmidt-Kraepelin · Yishay Mansour -
2021 Oral: The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition »
Tiancheng Jin · Longbo Huang · Haipeng Luo -
2020 Poster: Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang -
2020 Poster: Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition »
Tiancheng Jin · Haipeng Luo -
2020 Spotlight: Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition »
Tiancheng Jin · Haipeng Luo -
2020 Oral: Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang -
2020 Poster: Comparator-Adaptive Convex Bandits »
Dirk van der Hoeven · Ashok Cutkosky · Haipeng Luo -
2019 Poster: Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function »
Aviv Rosenberg · Yishay Mansour -
2019 Poster: Equipping Experts/Bandits with Long-term Memory »
Kai Zheng · Haipeng Luo · Ilias Diakonikolas · Liwei Wang -
2019 Poster: Model Selection for Contextual Bandits »
Dylan Foster · Akshay Krishnamurthy · Haipeng Luo -
2019 Spotlight: Model Selection for Contextual Bandits »
Dylan Foster · Akshay Krishnamurthy · Haipeng Luo -
2019 Poster: Hypothesis Set Stability and Generalization »
Dylan Foster · Spencer Greenberg · Satyen Kale · Haipeng Luo · Mehryar Mohri · Karthik Sridharan -
2017 Workshop: Learning in the Presence of Strategic Behavior »
Nika Haghtalab · Yishay Mansour · Tim Roughgarden · Vasilis Syrgkanis · Jennifer Wortman Vaughan -
2017 Poster: Submultiplicative Glivenko-Cantelli and Uniform Convergence of Revenues »
Noga Alon · Moshe Babaioff · Yannai A. Gonczarowski · Yishay Mansour · Shay Moran · Amir Yehudayoff -
2017 Spotlight: Submultiplicative Glivenko-Cantelli and Uniform Convergence of Revenues »
Noga Alon · Moshe Babaioff · Yannai A. Gonczarowski · Yishay Mansour · Shay Moran · Amir Yehudayoff -
2017 Poster: Multi-Armed Bandits with Metric Movement Costs »
Tomer Koren · Roi Livni · Yishay Mansour