Timezone: »
Poster
Model Selection in Contextual Stochastic Bandit Problems
Aldo Pacchiano · My Phan · Yasin Abbasi Yadkori · Anup Rao · Julian Zimmert · Tor Lattimore · Csaba Szepesvari
We study bandit model selection in stochastic environments. Our approach relies on a master algorithm that selects between candidate base algorithms. We develop a master-base algorithm abstraction that can work with general classes of base algorithms and different type of adversarial master algorithms. Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee. We show through a lower bound that even when one of the base algorithms has $O(\log T)$ regret, in general it is impossible to get better than $\Omega(\sqrt{T})$ regret in model selection, even asymptotically. Using our techniques, we address model selection in a variety of problems such as misspecified linear contextual bandits \citep{lattimore2019learning}, linear bandit with unknown dimension \citep{Foster-Krishnamurthy-Luo-2019} and reinforcement learning with unknown feature maps. Our algorithm requires the knowledge of the optimal base regret to adjust the master learning rate. We show that without such prior knowledge any master can suffer a regret larger than the optimal base regret.
Author Information
Aldo Pacchiano (UC Berkeley)
My Phan (University of Massachusetts Amherst)
Yasin Abbasi Yadkori (DeepMind)
Anup Rao (School of Computer Science, Georgia Tech)
Julian Zimmert (Google)
Tor Lattimore (DeepMind)
Csaba Szepesvari (DeepMind / University of Alberta)
More from the Same Authors
-
2021 Spotlight: Variational Bayesian Optimistic Sampling »
Brendan O'Donoghue · Tor Lattimore -
2021 Spotlight: Information Directed Sampling for Sparse Linear Bandits »
Botao Hao · Tor Lattimore · Wei Deng -
2021 Spotlight: On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method »
Junyu Zhang · Chengzhuo Ni · zheng Yu · Csaba Szepesvari · Mengdi Wang -
2021 Spotlight: Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning »
Christoph Dann · Teodor Vanislavov Marinov · Mehryar Mohri · Julian Zimmert -
2022 Poster: Learning General World Models in a Handful of Reward-Free Deployments »
Yingchen Xu · Jack Parker-Holder · Aldo Pacchiano · Philip Ball · Oleh Rybkin · S Roberts · Tim Rocktäschel · Edward Grefenstette -
2022 Poster: The Role of Baselines in Policy Gradient Optimization »
Jincheng Mei · Wesley Chung · Valentin Thomas · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2022 Poster: Sample-Efficient Reinforcement Learning of Partially Observable Markov Games »
Qinghua Liu · Csaba Szepesvari · Chi Jin -
2022 Poster: Best of Both Worlds Model Selection »
Aldo Pacchiano · Christoph Dann · Claudio Gentile -
2022 Poster: Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity »
Abhishek Gupta · Aldo Pacchiano · Yuexiang Zhai · Sham Kakade · Sergey Levine -
2022 Poster: Stochastic Online Learning with Feedback Graphs: Finite-Time and Asymptotic Optimality »
Teodor Vanislavov Marinov · Mehryar Mohri · Julian Zimmert -
2022 Poster: A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback »
Saeed Masoudian · Julian Zimmert · Yevgeny Seldin -
2022 Poster: Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs »
Gellért Weisz · András György · Tadashi Kozuno · Csaba Szepesvari -
2022 Poster: Near-Optimal Sample Complexity Bounds for Constrained MDPs »
Sharan Vaswani · Lin Yang · Csaba Szepesvari -
2022 Poster: Sample Constrained Treatment Effect Estimation »
Raghavendra Addanki · David Arbour · Tung Mai · Cameron Musco · Anup Rao -
2022 Poster: Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization »
Hui Yuan · Chengzhuo Ni · Huazheng Wang · Xuezhou Zhang · Le Cong · Csaba Szepesvari · Mengdi Wang -
2021 Poster: A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning »
Christoph Dann · Mehryar Mohri · Tong Zhang · Julian Zimmert -
2021 Poster: Near Optimal Policy Optimization via REPS »
Aldo Pacchiano · Jonathan N Lee · Peter Bartlett · Ofir Nachum -
2021 Poster: No Regrets for Learning the Prior in Bandits »
Soumya Basu · Branislav Kveton · Manzil Zaheer · Csaba Szepesvari -
2021 Poster: Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning »
Christoph Dann · Teodor Vanislavov Marinov · Mehryar Mohri · Julian Zimmert -
2021 Poster: On the Theory of Reinforcement Learning with Once-per-Episode Feedback »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2021 Poster: Variational Bayesian Optimistic Sampling »
Brendan O'Donoghue · Tor Lattimore -
2021 Poster: Tactical Optimism and Pessimism for Deep Reinforcement Learning »
Ted Moskovitz · Jack Parker-Holder · Aldo Pacchiano · Michael Arbel · Michael Jordan -
2021 Poster: Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection »
Matteo Papini · Andrea Tirinzoni · Aldo Pacchiano · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 Poster: On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method »
Junyu Zhang · Chengzhuo Ni · zheng Yu · Csaba Szepesvari · Mengdi Wang -
2021 Poster: Understanding the Effect of Stochasticity in Policy Optimization »
Jincheng Mei · Bo Dai · Chenjun Xiao · Csaba Szepesvari · Dale Schuurmans -
2021 Poster: Neural Pseudo-Label Optimism for the Bank Loan Problem »
Aldo Pacchiano · Shaun Singh · Edward Chou · Alex Berg · Jakob Foerster -
2021 Poster: On the Role of Optimization in Double Descent: A Least Squares Study »
Ilja Kuzborskij · Csaba Szepesvari · Omar Rivasplata · Amal Rannen-Triki · Razvan Pascanu -
2021 Poster: Information Directed Sampling for Sparse Linear Bandits »
Botao Hao · Tor Lattimore · Wei Deng -
2021 Poster: Bandit Phase Retrieval »
Tor Lattimore · Botao Hao -
2021 Poster: The Pareto Frontier of model selection for general Contextual Bandits »
Teodor Vanislavov Marinov · Julian Zimmert -
2020 Poster: Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian »
Jack Parker-Holder · Luke Metz · Cinjon Resnick · Hengyuan Hu · Adam Lerer · Alistair Letcher · Alexander Peysakhovich · Aldo Pacchiano · Jakob Foerster -
2020 Poster: Effective Diversity in Population Based Reinforcement Learning »
Jack Parker-Holder · Aldo Pacchiano · Krzysztof M Choromanski · Stephen J Roberts -
2020 Poster: High-Dimensional Sparse Linear Bandits »
Botao Hao · Tor Lattimore · Mengdi Wang -
2020 Poster: Adapting to Misspecification in Contextual Bandits »
Dylan Foster · Claudio Gentile · Mehryar Mohri · Julian Zimmert -
2020 Spotlight: Effective Diversity in Population Based Reinforcement Learning »
Jack Parker-Holder · Aldo Pacchiano · Krzysztof M Choromanski · Stephen J Roberts -
2020 Poster: ImpatientCapsAndRuns: Approximately Optimal Algorithm Configuration from an Infinite Pool »
Gellert Weisz · András György · Wei-I Lin · Devon Graham · Kevin Leyton-Brown · Csaba Szepesvari · Brendan Lucier -
2020 Poster: Differentiable Meta-Learning of Bandit Policies »
Craig Boutilier · Chih-wei Hsu · Branislav Kveton · Martin Mladenov · Csaba Szepesvari · Manzil Zaheer -
2020 Poster: PAC-Bayes Analysis Beyond the Usual Bounds »
Omar Rivasplata · Ilja Kuzborskij · Csaba Szepesvari · John Shawe-Taylor -
2020 Poster: Gaussian Gated Linear Networks »
David Budden · Adam Marblestone · Eren Sezener · Tor Lattimore · Gregory Wayne · Joel Veness -
2020 Poster: Variational Policy Gradient Method for Reinforcement Learning with General Utilities »
Junyu Zhang · Alec Koppel · Amrit Singh Bedi · Csaba Szepesvari · Mengdi Wang -
2020 Poster: Escaping the Gravitational Pull of Softmax »
Jincheng Mei · Chenjun Xiao · Bo Dai · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: Online Algorithm for Unsupervised Sequential Selection with Contextual Information »
Arun Verma · Manjesh Kumar Hanawal · Csaba Szepesvari · Venkatesh Saligrama -
2020 Poster: Efficient Planning in Large MDPs with Weak Linear Function Approximation »
Roshan Shariff · Csaba Szepesvari -
2020 Spotlight: Variational Policy Gradient Method for Reinforcement Learning with General Utilities »
Junyu Zhang · Alec Koppel · Amrit Singh Bedi · Csaba Szepesvari · Mengdi Wang -
2020 Oral: Escaping the Gravitational Pull of Softmax »
Jincheng Mei · Chenjun Xiao · Bo Dai · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: CoinDICE: Off-Policy Confidence Interval Estimation »
Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Spotlight: CoinDICE: Off-Policy Confidence Interval Estimation »
Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2019 Poster: Thompson Sampling and Approximate Inference »
My Phan · Yasin Abbasi Yadkori · Justin Domke -
2019 Poster: Bootstrapping Upper Confidence Bound »
Botao Hao · Yasin Abbasi Yadkori · Zheng Wen · Guang Cheng -
2019 Poster: Think out of the "Box": Generically-Constrained Asynchronous Composite Optimization and Hedging »
Pooria Joulani · András György · Csaba Szepesvari -
2019 Poster: Detecting Overfitting via Adversarial Examples »
Roman Werpachowski · András György · Csaba Szepesvari -
2019 Poster: From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization »
Krzysztof M Choromanski · Aldo Pacchiano · Jack Parker-Holder · Yunhao Tang · Vikas Sindhwani -
2019 Poster: A Geometric Perspective on Optimal Representations for Reinforcement Learning »
Marc Bellemare · Will Dabney · Robert Dadashi · Adrien Ali Taiga · Pablo Samuel Castro · Nicolas Le Roux · Dale Schuurmans · Tor Lattimore · Clare Lyle -
2019 Poster: Connections Between Mirror Descent, Thompson Sampling and the Information Ratio »
Julian Zimmert · Tor Lattimore -
2018 : Datasets and Benchmarks for Causal Learning »
Csaba Szepesvari · Isabelle Guyon · Nicolai Meinshausen · David Blei · Elias Bareinboim · Bernhard Schölkopf · Pietro Perona -
2018 : Model-free vs. Model-based Learning in a Causal World: Some Stories from Online Learning to Rank »
Csaba Szepesvari -
2018 Poster: TopRank: A practical algorithm for online stochastic ranking »
Tor Lattimore · Branislav Kveton · Shuai Li · Csaba Szepesvari -
2018 Poster: Gen-Oja: Simple & Efficient Algorithm for Streaming Generalized Eigenvector Computation »
Kush Bhatia · Aldo Pacchiano · Nicolas Flammarion · Peter Bartlett · Michael Jordan -
2018 Poster: Factored Bandits »
Julian Zimmert · Yevgeny Seldin -
2018 Poster: Geometrically Coupled Monte Carlo Sampling »
Mark Rowland · Krzysztof Choromanski · François Chalus · Aldo Pacchiano · Tamas Sarlos · Richard Turner · Adrian Weller -
2018 Poster: Scalar Posterior Sampling with Applications »
Georgios Theocharous · Zheng Wen · Yasin Abbasi Yadkori · Nikos Vlassis -
2018 Spotlight: Geometrically Coupled Monte Carlo Sampling »
Mark Rowland · Krzysztof Choromanski · François Chalus · Aldo Pacchiano · Tamas Sarlos · Richard Turner · Adrian Weller -
2018 Poster: PAC-Bayes bounds for stable algorithms with instance-dependent priors »
Omar Rivasplata · Emilio Parrado-Hernandez · John Shawe-Taylor · Shiliang Sun · Csaba Szepesvari -
2017 Poster: Near Minimax Optimal Players for the Finite-Time 3-Expert Prediction Problem »
Yasin Abbasi Yadkori · Peter Bartlett · Victor Gabillon -
2017 Poster: Conservative Contextual Linear Bandits »
Abbas Kazerouni · Mohammad Ghavamzadeh · Yasin Abbasi · Benjamin Van Roy -
2017 Poster: Multi-view Matrix Factorization for Linear Dynamical System Estimation »
Mahdi Karami · Martha White · Dale Schuurmans · Csaba Szepesvari -
2016 Poster: Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities »
Ruitong Huang · Tor Lattimore · András György · Csaba Szepesvari -
2016 Poster: SDP Relaxation with Randomized Rounding for Energy Disaggregation »
Kiarash Shaloudegi · András György · Csaba Szepesvari · Wilsun Xu -
2016 Oral: SDP Relaxation with Randomized Rounding for Energy Disaggregation »
Kiarash Shaloudegi · András György · Csaba Szepesvari · Wilsun Xu -
2015 Workshop: Machine Learning From and For Adaptive User Technologies: From Active Learning & Experimentation to Optimization & Personalization »
Joseph Jay Williams · Yasin Abbasi Yadkori · Finale Doshi-Velez -
2015 : Confidence intervals for the mixing time of a reversible Markov chain from a single sample path »
Csaba Szepesvari -
2015 Poster: Online Learning with Gaussian Payoffs and Side Observations »
Yifan Wu · András György · Csaba Szepesvari -
2015 Poster: Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path »
Daniel Hsu · Aryeh Kontorovich · Csaba Szepesvari -
2015 Poster: Linear Multi-Resource Allocation with Semi-Bandit Feedback »
Tor Lattimore · Yacov Crammer · Csaba Szepesvari -
2015 Poster: Fast, Provable Algorithms for Isotonic Regression in all L_p-norms »
Rasmus Kyng · Anup Rao · Sushant Sachdeva -
2015 Poster: Combinatorial Cascading Bandits »
Branislav Kveton · Zheng Wen · Azin Ashkan · Csaba Szepesvari -
2015 Poster: Minimax Time Series Prediction »
Wouter Koolen · Alan Malek · Peter Bartlett · Yasin Abbasi Yadkori -
2014 Workshop: Novel Trends and Applications in Reinforcement Learning »
Csaba Szepesvari · Marc Deisenroth · Sergey Levine · Pedro Ortega · Brian Ziebart · Emma Brunskill · Naftali Tishby · Gerhard Neumann · Daniel Lee · Sridhar Mahadevan · Pieter Abbeel · David Silver · Vicenç Gómez -
2014 Workshop: Large-scale reinforcement learning and Markov decision problems »
Benjamin Van Roy · Mohammad Ghavamzadeh · Peter Bartlett · Yasin Abbasi Yadkori · Ambuj Tewari -
2014 Poster: Universal Option Models »
hengshuai yao · Csaba Szepesvari · Richard Sutton · Joseph Modayil · Shalabh Bhatnagar -
2013 Workshop: Resource-Efficient Machine Learning »
Yevgeny Seldin · Yasin Abbasi Yadkori · Yacov Crammer · Ralf Herbrich · Peter Bartlett -
2013 Poster: Online Learning with Costly Features and Labels »
Navid Zolghadr · Gábor Bartók · Russell Greiner · András György · Csaba Szepesvari -
2013 Poster: Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions »
Yasin Abbasi Yadkori · Peter Bartlett · Varun Kanade · Yevgeny Seldin · Csaba Szepesvari -
2012 Session: Oral Session 6 »
Csaba Szepesvari -
2012 Poster: Deep Representations and Codes for Image Auto-Annotation »
Jamie Kiros · Csaba Szepesvari -
2011 Poster: Improved Algorithms for Linear Stochastic Bandits »
Yasin Abbasi Yadkori · David Pal · Csaba Szepesvari -
2011 Spotlight: Improved Algorithms for Linear Stochastic Bandits »
Yasin Abbasi Yadkori · David Pal · Csaba Szepesvari -
2010 Spotlight: Online Markov Decision Processes under Bandit Feedback »
Gergely Neu · András György · András Antos · Csaba Szepesvari -
2010 Poster: Online Markov Decision Processes under Bandit Feedback »
Gergely Neu · András György · Csaba Szepesvari · András Antos -
2010 Poster: Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs »
David Pal · Barnabas Poczos · Csaba Szepesvari -
2010 Poster: Parametric Bandits: The Generalized Linear Case »
Sarah Filippi · Olivier Cappé · Aurélien Garivier · Csaba Szepesvari -
2010 Poster: Error Propagation for Approximate Policy and Value Iteration »
Amir-massoud Farahmand · Remi Munos · Csaba Szepesvari -
2009 Poster: Multi-Step Dyna Planning for Policy Evaluation and Control »
Hengshuai Yao · Richard Sutton · Shalabh Bhatnagar · Dongcui Diao · Csaba Szepesvari -
2009 Poster: A General Projection Property for Distribution Families »
Yao-Liang Yu · Yuxi Li · Dale Schuurmans · Csaba Szepesvari -
2009 Poster: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation »
Hamid R Maei · Csaba Szepesvari · Shalabh Batnaghar · Doina Precup · David Silver · Richard Sutton -
2009 Spotlight: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation »
Hamid R Maei · Csaba Szepesvari · Shalabh Batnaghar · Doina Precup · David Silver · Richard Sutton -
2008 Poster: Online Optimization in X-Armed Bandits »
Sebastien Bubeck · Remi Munos · Gilles Stoltz · Csaba Szepesvari -
2008 Poster: Regularized Policy Iteration »
Amir-massoud Farahmand · Mohammad Ghavamzadeh · Csaba Szepesvari · Shie Mannor -
2008 Poster: A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approxi »
Richard Sutton · Csaba Szepesvari · Hamid R Maei -
2007 Poster: Fitted Q-iteration in continuous action-space MDPs »
Remi Munos · András Antos · Csaba Szepesvari