Timezone: »
Poster
Context-lumpable stochastic bandits
Chung-Wei Lee · Qinghua Liu · Yasin Abbasi Yadkori · Chi Jin · Tor Lattimore · Csaba Szepesvari
We consider a contextual bandit problem with $S $ contexts and $K $ actions. In each round $t=1,2,\dots$ the learnerobserves a random context and chooses an action based on its past experience. The learner then observes a random reward whose mean is a function of the context and the action for the round. Under the assumption that the contexts can be lumped into $r\le \min(S ,K)$ groups such that the mean reward for the various actions is the same for any two contexts that are in the same group, we give an algorithm that outputs an $\epsilon$-optimal policy after using at most $\widetilde O(r (S +K )/\epsilon^2)$ samples with high probability and provide a matching $\widetilde\Omega(r (S +K )/\epsilon^2)$ lower bound. In the regret minimization setting, we give an algorithm whose cumulative regret up to time $T$ is bounded by $\widetilde O(\sqrt{r ^3(S +K )T})$. To the best of our knowledge, we are the first to show the near-optimal sample complexity in the PAC setting and $\widetilde O{\sqrt{\text{poly}(r)(S+K)T}}$ minimax regret in the online setting for this problem. We also show our algorithms can be applied to more general low-rank bandits and get improved regret bounds in some scenarios.
Author Information
Chung-Wei Lee (University of Southern California)
Qinghua Liu (Princeton University)
Yasin Abbasi Yadkori (DeepMind)
Chi Jin (Princeton University)
Tor Lattimore (DeepMind)
Csaba Szepesvari (University of Alberta)
More from the Same Authors
-
2021 Spotlight: Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms »
Chi Jin · Qinghua Liu · Sobhan Miryoosefi -
2022 : Clairvoyant Regret Minimization: Equivalence with Nemirovski’s Conceptual Prox Method and Extension to General Convex Games »
Gabriele Farina · Christian Kroer · Chung-Wei Lee · Haipeng Luo -
2023 : Maximum Likelihood Estimation is All You Need for Well-Specified Covariate Shift »
Jiawei Ge · Shange Tang · Jianqing Fan · Cong Ma · Chi Jin -
2023 Poster: Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL »
Qinghua Liu · Gellert Weisz · András György · Chi Jin · Csaba Szepesvari -
2023 Poster: Is RLHF More Difficult than Standard RL? A Theoretical Perspective »
Yuanhao Wang · Qinghua Liu · Chi Jin -
2023 Poster: Regret Matching+: (In)Stability and Fast Convergence in Games »
Gabriele Farina · Julien Grand-Clément · Christian Kroer · Chung-Wei Lee · Haipeng Luo -
2023 Poster: Regret Minimization via Saddle Point Optimization »
Johannes Kirschner · Alireza Bakhtiari · Kushagra Chandak · Volodymyr Tkachuk · Csaba Szepesvari -
2023 Poster: Ordering-based Conditions for Global Convergence of Policy Gradient Methods »
Jincheng Mei · Bo Dai · Alekh Agarwal · Mohammad Ghavamzadeh · Csaba Szepesvari · Dale Schuurmans -
2023 Poster: Online RL in Linearly $q^\pi$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore »
Gellert Weisz · András György · Csaba Szepesvari -
2023 Poster: Probabilistic Inference in Reinforcement Learning Done Right »
Jean Tarbouriech · Tor Lattimore · Brendan O'Donoghue -
2023 Poster: DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method »
Ahmed Khaled · Konstantin Mishchenko · Chi Jin -
2023 Oral: Online RL in Linearly $q^\pi$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore »
Gellert Weisz · András György · Csaba Szepesvari -
2023 Oral: Ordering-based Conditions for Global Convergence of Policy Gradient Methods »
Jincheng Mei · Bo Dai · Alekh Agarwal · Mohammad Ghavamzadeh · Csaba Szepesvari · Dale Schuurmans -
2022 Poster: Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent »
Yu Bai · Chi Jin · Song Mei · Ziang Song · Tiancheng Yu -
2022 Poster: Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games »
Ioannis Anagnostides · Gabriele Farina · Christian Kroer · Chung-Wei Lee · Haipeng Luo · Tuomas Sandholm -
2022 Poster: The Role of Baselines in Policy Gradient Optimization »
Jincheng Mei · Wesley Chung · Valentin Thomas · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2022 Poster: Sample-Efficient Reinforcement Learning of Partially Observable Markov Games »
Qinghua Liu · Csaba Szepesvari · Chi Jin -
2022 Poster: Policy Optimization for Markov Games: Unified Framework and Faster Convergence »
Runyu Zhang · Qinghua Liu · Huan Wang · Caiming Xiong · Na Li · Yu Bai -
2022 Poster: Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs »
Gellért Weisz · András György · Tadashi Kozuno · Csaba Szepesvari -
2022 Poster: Near-Optimal Sample Complexity Bounds for Constrained MDPs »
Sharan Vaswani · Lin Yang · Csaba Szepesvari -
2022 Poster: Near-Optimal No-Regret Learning Dynamics for General Convex Games »
Gabriele Farina · Ioannis Anagnostides · Haipeng Luo · Chung-Wei Lee · Christian Kroer · Tuomas Sandholm -
2022 Poster: Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization »
Hui Yuan · Chengzhuo Ni · Huazheng Wang · Xuezhou Zhang · Le Cong · Csaba Szepesvari · Mengdi Wang -
2022 Poster: Regret Bounds for Information-Directed Reinforcement Learning »
Botao Hao · Tor Lattimore -
2021 Poster: Last-iterate Convergence in Extensive-Form Games »
Chung-Wei Lee · Christian Kroer · Haipeng Luo -
2021 Poster: Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms »
Chi Jin · Qinghua Liu · Sobhan Miryoosefi -
2021 Poster: Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses »
Haipeng Luo · Chen-Yu Wei · Chung-Wei Lee -
2020 Poster: Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization »
Jianyu Wang · Qinghua Liu · Hao Liang · Gauri Joshi · H. Vincent Poor -
2020 Poster: Model Selection in Contextual Stochastic Bandit Problems »
Aldo Pacchiano · My Phan · Yasin Abbasi Yadkori · Anup Rao · Julian Zimmert · Tor Lattimore · Csaba Szepesvari -
2020 Poster: Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang -
2020 Oral: Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang -
2020 Poster: On the Theory of Transfer Learning: The Importance of Task Diversity »
Nilesh Tripuraneni · Michael Jordan · Chi Jin -
2020 Poster: Near-Optimal Reinforcement Learning with Self-Play »
Yu Bai · Chi Jin · Tiancheng Yu -
2020 Poster: Sample-Efficient Reinforcement Learning of Undercomplete POMDPs »
Chi Jin · Sham Kakade · Akshay Krishnamurthy · Qinghua Liu -
2020 Spotlight: Sample-Efficient Reinforcement Learning of Undercomplete POMDPs »
Chi Jin · Sham Kakade · Akshay Krishnamurthy · Qinghua Liu -
2020 Poster: On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces »
Zhuoran Yang · Chi Jin · Zhaoran Wang · Mengdi Wang · Michael Jordan -
2019 Poster: Thompson Sampling and Approximate Inference »
My Phan · Yasin Abbasi Yadkori · Justin Domke -
2019 Poster: Bootstrapping Upper Confidence Bound »
Botao Hao · Yasin Abbasi Yadkori · Zheng Wen · Guang Cheng -
2018 Poster: TopRank: A practical algorithm for online stochastic ranking »
Tor Lattimore · Branislav Kveton · Shuai Li · Csaba Szepesvari -
2018 Poster: Single-Agent Policy Tree Search With Guarantees »
Laurent Orseau · Levi Lelis · Tor Lattimore · Theophane Weber -
2018 Poster: Scalar Posterior Sampling with Applications »
Georgios Theocharous · Zheng Wen · Yasin Abbasi Yadkori · Nikos Vlassis -
2017 Poster: A Scale Free Algorithm for Stochastic Bandits with Bounded Kurtosis »
Tor Lattimore -
2017 Poster: Near Minimax Optimal Players for the Finite-Time 3-Expert Prediction Problem »
Yasin Abbasi Yadkori · Peter Bartlett · Victor Gabillon -
2017 Poster: Conservative Contextual Linear Bandits »
Abbas Kazerouni · Mohammad Ghavamzadeh · Yasin Abbasi · Benjamin Van Roy -
2017 Poster: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning »
Christoph Dann · Tor Lattimore · Emma Brunskill -
2017 Spotlight: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning »
Christoph Dann · Tor Lattimore · Emma Brunskill -
2016 Poster: Refined Lower Bounds for Adversarial Bandits »
Sébastien Gerchinovitz · Tor Lattimore -
2016 Poster: Causal Bandits: Learning Good Interventions via Causal Inference »
Finnian Lattimore · Tor Lattimore · Mark Reid -
2016 Poster: Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities »
Ruitong Huang · Tor Lattimore · András György · Csaba Szepesvari -
2016 Poster: On Explore-Then-Commit strategies »
Aurélien Garivier · Tor Lattimore · Emilie Kaufmann -
2015 Workshop: Machine Learning From and For Adaptive User Technologies: From Active Learning & Experimentation to Optimization & Personalization »
Joseph Jay Williams · Yasin Abbasi Yadkori · Finale Doshi-Velez -
2015 Poster: The Pareto Regret Frontier for Bandits »
Tor Lattimore -
2015 Poster: Linear Multi-Resource Allocation with Semi-Bandit Feedback »
Tor Lattimore · Yacov Crammer · Csaba Szepesvari -
2015 Poster: Minimax Time Series Prediction »
Wouter Koolen · Alan Malek · Peter Bartlett · Yasin Abbasi Yadkori -
2014 Workshop: Large-scale reinforcement learning and Markov decision problems »
Benjamin Van Roy · Mohammad Ghavamzadeh · Peter Bartlett · Yasin Abbasi Yadkori · Ambuj Tewari -
2014 Poster: Bounded Regret for Finite-Armed Structured Bandits »
Tor Lattimore · Remi Munos -
2013 Workshop: Resource-Efficient Machine Learning »
Yevgeny Seldin · Yasin Abbasi Yadkori · Yacov Crammer · Ralf Herbrich · Peter Bartlett -
2013 Poster: Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions »
Yasin Abbasi Yadkori · Peter Bartlett · Varun Kanade · Yevgeny Seldin · Csaba Szepesvari -
2011 Poster: Improved Algorithms for Linear Stochastic Bandits »
Yasin Abbasi Yadkori · David Pal · Csaba Szepesvari -
2011 Spotlight: Improved Algorithms for Linear Stochastic Bandits »
Yasin Abbasi Yadkori · David Pal · Csaba Szepesvari