Timezone: »
Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such scenarios can often be better treated as episodic fixed-horizon MDPs, for which only looser bounds on the sample complexity exist. A natural notion of sample complexity in this setting is the number of episodes required to guarantee a certain performance with high probability (PAC guarantee). In this paper, we derive an upper PAC bound of order O(|S|²|A|H² log(1/δ)/ɛ²) and a lower PAC bound Ω(|S||A|H² log(1/(δ+c))/ɛ²) (ignoring log-terms) that match up to log-terms and an additional linear dependency on the number of states |S|. The lower bound is the first of its kind for this setting. Our upper bound leverages Bernstein's inequality to improve on previous bounds for episodic finite-horizon MDPs which have a time-horizon dependency of at least H³.
Author Information
Christoph Dann (Carnegie Mellon University)
Emma Brunskill (CMU)
More from the Same Authors
-
2021 Spotlight: Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations »
Ayush Sekhari · Christoph Dann · Mehryar Mohri · Yishay Mansour · Karthik Sridharan -
2021 Spotlight: Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning »
Christoph Dann · Teodor Vanislavov Marinov · Mehryar Mohri · Julian Zimmert -
2022 Poster: Best of Both Worlds Model Selection »
Aldo Pacchiano · Christoph Dann · Claudio Gentile -
2021 Poster: A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning »
Christoph Dann · Mehryar Mohri · Tong Zhang · Julian Zimmert -
2021 Poster: Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning »
Christoph Dann · Teodor Vanislavov Marinov · Mehryar Mohri · Julian Zimmert -
2021 Poster: Neural Active Learning with Performance Guarantees »
Zhilei Wang · Pranjal Awasthi · Christoph Dann · Ayush Sekhari · Claudio Gentile -
2021 Poster: Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations »
Ayush Sekhari · Christoph Dann · Mehryar Mohri · Yishay Mansour · Karthik Sridharan -
2020 Poster: Reinforcement Learning with Feedback Graphs »
Christoph Dann · Yishay Mansour · Mehryar Mohri · Ayush Sekhari · Karthik Sridharan -
2018 Poster: On Oracle-Efficient PAC RL with Rich Observations »
Christoph Dann · Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2018 Spotlight: On Oracle-Efficient PAC RL with Rich Observations »
Christoph Dann · Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 Poster: Regret Minimization in MDPs with Options without Prior Knowledge »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric · Emma Brunskill -
2017 Poster: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning »
Christoph Dann · Tor Lattimore · Emma Brunskill -
2017 Spotlight: Regret Minimization in MDPs with Options without Prior Knowledge »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric · Emma Brunskill -
2017 Spotlight: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning »
Christoph Dann · Tor Lattimore · Emma Brunskill -
2016 : Learning to improve learning: ML in the classroom »
Emma Brunskill -
2016 Poster: (Withdrawn)Only H is left: Near-tight Episodic PAC RL »
Christoph Dann · Emma Brunskill -
2015 Poster: The Human Kernel »
Andrew Wilson · Christoph Dann · Chris Lucas · Eric Xing -
2015 Spotlight: The Human Kernel »
Andrew Wilson · Christoph Dann · Chris Lucas · Eric Xing -
2014 Workshop: Novel Trends and Applications in Reinforcement Learning »
Csaba Szepesvari · Marc Deisenroth · Sergey Levine · Pedro Ortega · Brian Ziebart · Emma Brunskill · Naftali Tishby · Gerhard Neumann · Daniel Lee · Sridhar Mahadevan · Pieter Abbeel · David Silver · Vicenç Gómez -
2013 Poster: Sequential Transfer in Multi-armed Bandit with Finite Set of Models »
Mohammad Gheshlaghi azar · Alessandro Lazaric · Emma Brunskill