Timezone: »
Poster
Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs
Andrea Tirinzoni · Aymen Al Marjani · Emilie Kaufmann
In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify an $\epsilon$-optimal policy with probability $1-\delta$. While minimax optimal algorithms exist for this problem, its instance-dependent complexity remains elusive in episodic Markov decision processes (MDPs). In this paper, we propose the first nearly matching (up to a horizon squared factor and logarithmic terms) upper and lower bounds on the sample complexity of PAC RL in deterministic episodic MDPs with finite state and action spaces. In particular, our bounds feature a new notion of sub-optimality gap for state-action pairs that we call the deterministic return gap. While our instance-dependent lower bound is written as a linear program, our algorithms are very simple and do not require solving such an optimization problem during learning. Their design and analyses employ novel ideas, including graph-theoretical concepts (minimum flows) and a new maximum-coverage exploration strategy.
Author Information
Andrea Tirinzoni (Meta AI)
Aymen Al Marjani (ENS Lyon)
Emilie Kaufmann (CNRS)
More from the Same Authors
-
2023 Poster: Adaptive Algorithms for Relaxed Pareto Set Identification »
Cyrille KONE · Emilie Kaufmann · Laura Richert -
2023 Poster: An $\varepsilon$-Best-Arm Identification Algorithm for Fixed-Confidence and Beyond »
Marc Jourdan · Rémy Degenne · Emilie Kaufmann -
2023 Poster: On the Complexity of Differentially Private Best-Arm Identification with Fixed Confidence »
Achraf Azize · Marc Jourdan · Aymen Al Marjani · Debabrota Basu -
2022 Poster: Top Two Algorithms Revisited »
Marc Jourdan · Rémy Degenne · Dorian Baudry · Rianne de Heide · Emilie Kaufmann -
2022 Poster: On Elimination Strategies for Bandit Fixed-Confidence Identification »
Andrea Tirinzoni · Rémy Degenne -
2022 Poster: Near-Optimal Collaborative Learning in Bandits »
Clémence Réda · Sattar Vakili · Emilie Kaufmann -
2022 Poster: Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees »
Andrea Tirinzoni · Matteo Papini · Ahmed Touati · Alessandro Lazaric · Matteo Pirotta -
2022 Poster: Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits »
Lilian Besson · Emilie Kaufmann · Odalric-Ambrym Maillard · Julien Seznec -
2021 Poster: Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection »
Matteo Papini · Andrea Tirinzoni · Aldo Pacchiano · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 Poster: Navigating to the Best Policy in Markov Decision Processes »
Aymen Al Marjani · Aurélien Garivier · Alexandre Proutiere -
2021 Poster: Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification »
Clémence Réda · Andrea Tirinzoni · Rémy Degenne -
2020 Poster: An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits »
Andrea Tirinzoni · Matteo Pirotta · Marcello Restelli · Alessandro Lazaric -
2020 Poster: Sub-sampling for Efficient Non-Parametric Bandit Exploration »
Dorian Baudry · Emilie Kaufmann · Odalric-Ambrym Maillard -
2020 Spotlight: Sub-sampling for Efficient Non-Parametric Bandit Exploration »
Dorian Baudry · Emilie Kaufmann · Odalric-Ambrym Maillard -
2020 Poster: Planning in Markov Decision Processes with Gap-Dependent Sample Complexity »
Anders Jonsson · Emilie Kaufmann · Pierre Menard · Omar Darwiche Domingues · Edouard Leurent · Michal Valko -
2018 Poster: Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes »
Andrea Tirinzoni · Marek Petrik · Xiangli Chen · Brian Ziebart -
2018 Spotlight: Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes »
Andrea Tirinzoni · Marek Petrik · Xiangli Chen · Brian Ziebart -
2018 Poster: Transfer of Value Functions via Variational Methods »
Andrea Tirinzoni · Rafael Rodriguez Sanchez · Marcello Restelli