Timezone: »
In the contextual linear bandit setting, algorithms built on the optimism principle fail to exploit the structure of the problem and have been shown to be asymptotically suboptimal. In this paper, we follow recent approaches of deriving asymptotically optimal algorithms from problem-dependent regret lower bounds and we introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We build on a reformulation of the lower bound, where context distribution and exploration policy are decoupled, and we obtain an algorithm robust to unbalanced context distributions. Then, using an incremental primal-dual approach to solve the Lagrangian relaxation of the lower bound, we obtain a scalable and computationally efficient algorithm. Finally, we remove forced exploration and build on confidence intervals of the optimization problem to encourage a minimum level of exploration that is better adapted to the problem structure. We demonstrate the asymptotic optimality of our algorithm, while providing both problem-dependent and worst-case finite-time regret guarantees. Our bounds scale with the logarithm of the number of arms, thus avoiding the linear dependence common in all related prior works. Notably, we establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits. Finally, we verify that our algorithm obtains better empirical performance than state-of-the-art baselines.
Author Information
Andrea Tirinzoni (Politecnico di Milano)
Matteo Pirotta (Facebook AI Research)
Marcello Restelli (Politecnico di Milano)
Alessandro Lazaric (Facebook Artificial Intelligence Research)
More from the Same Authors
-
2021 Spotlight: Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Runlong Zhou · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 Spotlight: A Provably Efficient Sample Collection Strategy for Reinforcement Learning »
Jean Tarbouriech · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 Spotlight: Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning »
Alberto Maria Metelli · Alessio Russo · Marcello Restelli -
2021 : Policy Optimization via Optimal Policy Evaluation »
Alberto Maria Metelli · Samuele Meta · Marcello Restelli -
2022 : Multi-Armed Bandit Problem with Temporally-Partitioned Rewards »
Giulia Romano · Andrea Agostini · Francesco Trovò · Nicola Gatti · Marcello Restelli -
2022 : Provably Efficient Causal Model-Based Reinforcement Learning for Environment-Agnostic Generalization »
Mirco Mutti · Riccardo De Santi · Emanuele Rossi · Juan Calderon · Michael Bronstein · Marcello Restelli -
2023 Poster: Distributional Policy Evaluation: a Maximum Entropy approach to Representation Learning »
Riccardo Zamboni · Alberto Maria Metelli · Marcello Restelli -
2023 Poster: Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach »
Riccardo Poiani · Nicole Nobili · Alberto Maria Metelli · Marcello Restelli -
2022 Poster: Multi-Fidelity Best-Arm Identification »
Riccardo Poiani · Alberto Maria Metelli · Marcello Restelli -
2022 Poster: Challenging Common Assumptions in Convex Reinforcement Learning »
Mirco Mutti · Riccardo De Santi · Piersilvio De Bartolomeis · Marcello Restelli -
2022 Poster: Off-Policy Evaluation with Deficient Support Using Side Information »
Nicolò Felicioni · Maurizio Ferrari Dacrema · Marcello Restelli · Paolo Cremonesi -
2022 Poster: Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees »
Andrea Tirinzoni · Matteo Papini · Ahmed Touati · Alessandro Lazaric · Matteo Pirotta -
2021 Poster: Local Differential Privacy for Regret Minimization in Reinforcement Learning »
Evrard Garcelon · Vianney Perchet · Ciara Pike-Burke · Matteo Pirotta -
2021 Poster: Learning in Non-Cooperative Configurable Markov Decision Processes »
Giorgia Ramponi · Alberto Maria Metelli · Alessandro Concetti · Marcello Restelli -
2021 Poster: Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Runlong Zhou · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 Poster: A Provably Efficient Sample Collection Strategy for Reinforcement Learning »
Jean Tarbouriech · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 Poster: Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection »
Matteo Papini · Andrea Tirinzoni · Aldo Pacchiano · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 Poster: Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification »
Clémence Réda · Andrea Tirinzoni · Rémy Degenne -
2021 Poster: Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning »
Alberto Maria Metelli · Alessio Russo · Marcello Restelli -
2020 Poster: Inverse Reinforcement Learning from a Gradient-based Learner »
Giorgia Ramponi · Gianluca Drappo · Marcello Restelli -
2020 Session: Orals & Spotlights Track 31: Reinforcement Learning »
Dotan Di Castro · Marcello Restelli -
2020 Poster: Adversarial Attacks on Linear Contextual Bandits »
Evrard Garcelon · Baptiste Roziere · Laurent Meunier · Jean Tarbouriech · Olivier Teytaud · Alessandro Lazaric · Matteo Pirotta -
2020 Poster: Improved Sample Complexity for Incremental Autonomous Exploration in MDPs »
Jean Tarbouriech · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2020 Poster: Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration »
Andrea Zanette · Alessandro Lazaric · Mykel J Kochenderfer · Emma Brunskill -
2020 Oral: Improved Sample Complexity for Incremental Autonomous Exploration in MDPs »
Jean Tarbouriech · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2019 : Poster Session »
Ahana Ghosh · Javad Shafiee · Akhilan Boopathy · Alex Tamkin · Theodoros Vasiloudis · Vedant Nanda · Ali Baheri · Paul Fieguth · Andrew Bennett · Guanya Shi · Hao Liu · Arushi Jain · Jacob Tyo · Benjie Wang · Boxiao Chen · Carroll Wainwright · Chandramouli Shama Sastry · Chao Tang · Daniel S. Brown · David Inouye · David Venuto · Dhruv Ramani · Dimitrios Diochnos · Divyam Madaan · Dmitrii Krashenikov · Joel Oren · Doyup Lee · Eleanor Quint · elmira amirloo · Matteo Pirotta · Gavin Hartnett · Geoffroy Dubourg-Felonneau · Gokul Swamy · Pin-Yu Chen · Ilija Bogunovic · Jason Carter · Javier Garcia-Barcos · Jeet Mohapatra · Jesse Zhang · Jian Qian · John Martin · Oliver Richter · Federico Zaiter · Tsui-Wei Weng · Karthik Abinav Sankararaman · Kyriakos Polymenakos · Lan Hoang · mahdieh abbasi · Marco Gallieri · Mathieu Seurin · Matteo Papini · Matteo Turchetta · Matthew Sotoudeh · Mehrdad Hosseinzadeh · Nathan Fulton · Masatoshi Uehara · Niranjani Prasad · Oana-Maria Camburu · Patrik Kolaric · Philipp Renz · Prateek Jaiswal · Reazul Hasan Russel · Riashat Islam · Rishabh Agarwal · Alexander Aldrick · Sachin Vernekar · Sahin Lale · Sai Kiran Narayanaswami · Samuel Daulton · Sanjam Garg · Sebastian East · Shun Zhang · Soheil Dsidbari · Justin Goodwin · Victoria Krakovna · Wenhao Luo · Wesley Chung · Yuanyuan Shi · Yuh-Shyang Wang · Hongwei Jin · Ziping Xu -
2019 Poster: Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs »
Jian QIAN · Ronan Fruit · Matteo Pirotta · Alessandro Lazaric -
2019 Poster: A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning »
Nicolas Carion · Nicolas Usunier · Gabriel Synnaeve · Alessandro Lazaric -
2019 Spotlight: A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning »
Nicolas Carion · Nicolas Usunier · Gabriel Synnaeve · Alessandro Lazaric -
2019 Poster: Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters »
Alberto Maria Metelli · Amarildo Likmeta · Marcello Restelli -
2019 Poster: Limiting Extrapolation in Linear Approximate Value Iteration »
Andrea Zanette · Alessandro Lazaric · Mykel J Kochenderfer · Emma Brunskill -
2019 Poster: Regret Bounds for Learning State Representations in Reinforcement Learning »
Ronald Ortner · Matteo Pirotta · Alessandro Lazaric · Ronan Fruit · Odalric-Ambrym Maillard -
2018 Poster: Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes »
Andrea Tirinzoni · Marek Petrik · Xiangli Chen · Brian Ziebart -
2018 Spotlight: Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes »
Andrea Tirinzoni · Marek Petrik · Xiangli Chen · Brian Ziebart -
2018 Poster: Policy Optimization via Importance Sampling »
Alberto Maria Metelli · Matteo Papini · Francesco Faccio · Marcello Restelli -
2018 Poster: Transfer of Value Functions via Variational Methods »
Andrea Tirinzoni · Rafael Rodriguez Sanchez · Marcello Restelli -
2018 Oral: Policy Optimization via Importance Sampling »
Alberto Maria Metelli · Matteo Papini · Francesco Faccio · Marcello Restelli -
2018 Poster: Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric -
2018 Spotlight: Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric -
2017 Poster: Compatible Reward Inverse Reinforcement Learning »
Alberto Maria Metelli · Matteo Pirotta · Marcello Restelli -
2017 Poster: Regret Minimization in MDPs with Options without Prior Knowledge »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric · Emma Brunskill -
2017 Poster: Adaptive Batch Size for Safe Policy Gradients »
Matteo Papini · Matteo Pirotta · Marcello Restelli -
2017 Spotlight: Regret Minimization in MDPs with Options without Prior Knowledge »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric · Emma Brunskill -
2014 Poster: Sparse Multi-Task Reinforcement Learning »
Daniele Calandriello · Alessandro Lazaric · Marcello Restelli -
2013 Poster: Adaptive Step-Size for Policy Gradient Methods »
Matteo Pirotta · Marcello Restelli · Luca Bascetta -
2011 Poster: Transfer from Multiple MDPs »
Alessandro Lazaric · Marcello Restelli -
2007 Spotlight: Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods »
Alessandro Lazaric · Marcello Restelli · Andrea Bonarini -
2007 Poster: Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods »
Alessandro Lazaric · Marcello Restelli · Andrea Bonarini