Timezone: »
We consider the problem of online optimization, where a learner chooses a decision from a given decision set and suffers some loss associated with the decision and the state of the environment. The learner's objective is to minimize its cumulative regret against the best fixed decision in hindsight. Over the past few decades numerous variants have been considered, with many algorithms designed to achieve sub-linear regret in the worst case. However, this level of robustness comes at a cost. Proposed algorithms are often over-conservative, failing to adapt to the actual complexity of the loss sequence which is often far from the worst case. In this paper we introduce a general algorithm that, provided with a safe learning algorithm and an opportunistic benchmark, can effectively combine good worst-case guarantees with much improved performance on easy data. We derive general theoretical bounds on the regret of the proposed algorithm and discuss its implementation in a wide range of applications, notably in the problem of learning with shifting experts (a recent COLT open problem). Finally, we provide numerical simulations in the setting of prediction with expert advice with comparisons to the state of the art.
Author Information
Amir Sani (Centre d'Economie de la Sorbonne, CNRS)
Gergely Neu (Universitat Pompeu Fabra)
Alessandro Lazaric (Facebook Artificial Intelligence Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2014 Spotlight: Exploiting easy data in online optimization »
Wed. Dec 10th 04:40 -- 05:05 PM Room Level 2, room 210
More from the Same Authors
-
2023 Poster: First- and Second-Order Bounds for Adversarial Linear Contextual Bandits »
Iuliia Olkhovskaia · Jack Mayo · Tim van Erven · Gergely Neu · Chen-Yu Wei -
2022 Poster: Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits »
Gergely Neu · Iuliia Olkhovskaia · Matteo Papini · Ludovic Schwartz -
2022 Poster: Proximal Point Imitation Learning »
Luca Viano · Angeliki Kamoutsi · Gergely Neu · Igor Krawczuk · Volkan Cevher -
2021 Poster: Online learning in MDPs with linear function approximation and bandit feedback. »
Gergely Neu · Iuliia Olkhovskaia -
2020 Poster: A Unifying View of Optimism in Episodic Reinforcement Learning »
Gergely Neu · Ciara Pike-Burke -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates »
Carlos Riquelme · Hugo Penedones · Damien Vincent · Hartmut Maennel · Sylvain Gelly · Timothy A Mann · Andre Barreto · Gergely Neu -
2019 Poster: Beating SGD Saturation with Tail-Averaging and Minibatching »
Nicole Muecke · Gergely Neu · Lorenzo Rosasco -
2017 Poster: Regret Minimization in MDPs with Options without Prior Knowledge »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric · Emma Brunskill -
2017 Poster: Efficient Second-Order Online Kernel Learning with Adaptive Embedding »
Daniele Calandriello · Alessandro Lazaric · Michal Valko -
2017 Poster: Boltzmann Exploration Done Right »
Nicolò Cesa-Bianchi · Claudio Gentile · Gergely Neu · Gabor Lugosi -
2017 Spotlight: Regret Minimization in MDPs with Options without Prior Knowledge »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric · Emma Brunskill -
2015 : Discussion Panel »
Tim van Erven · Wouter Koolen · Peter Grünwald · Shai Ben-David · Dylan Foster · Satyen Kale · Gergely Neu -
2015 : Adaptive Regret Bounds for Non-Stochastic Bandits »
Gergely Neu -
2015 Poster: Explore no more: Improved high-probability regret bounds for non-stochastic bandits »
Gergely Neu -
2014 Poster: Best-Arm Identification in Linear Bandits »
Marta Soare · Alessandro Lazaric · Remi Munos -
2014 Poster: Efficient learning by implicit exploration in bandit problems with side observations »
Tomáš Kocák · Gergely Neu · Michal Valko · Remi Munos -
2014 Poster: Online combinatorial optimization with stochastic decision sets and adversarial losses »
Gergely Neu · Michal Valko -
2014 Poster: Sparse Multi-Task Reinforcement Learning »
Daniele Calandriello · Alessandro Lazaric · Marcello Restelli -
2013 Poster: Online learning in episodic Markovian decision processes by relative entropy policy search »
Alexander Zimin · Gergely Neu -
2012 Poster: Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence »
Victor Gabillon · Mohammad Ghavamzadeh · Alessandro Lazaric -
2012 Poster: Risk-Aversion in Multi-armed Bandits »
Amir Sani · Alessandro Lazaric · Remi Munos -
2011 Poster: Multi-Bandit Best Arm Identification »
Victor Gabillon · Mohammad Ghavamzadeh · Alessandro Lazaric · Sebastien Bubeck -
2011 Poster: Transfer from Multiple MDPs »
Alessandro Lazaric · Marcello Restelli -
2010 Spotlight: Online Markov Decision Processes under Bandit Feedback »
Gergely Neu · András György · András Antos · Csaba Szepesvari -
2010 Poster: Online Markov Decision Processes under Bandit Feedback »
Gergely Neu · András György · Csaba Szepesvari · András Antos -
2010 Spotlight: LSTD with Random Projections »
Mohammad Ghavamzadeh · Alessandro Lazaric · Odalric-Ambrym Maillard · Remi Munos -
2010 Poster: LSTD with Random Projections »
Mohammad Ghavamzadeh · Alessandro Lazaric · Odalric-Ambrym Maillard · Remi Munos -
2007 Spotlight: Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods »
Alessandro Lazaric · Marcello Restelli · Andrea Bonarini -
2007 Poster: Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods »
Alessandro Lazaric · Marcello Restelli · Andrea Bonarini