Timezone: »
We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature. Finally, we show how the assumptions of our algorithm satisfy a sensible parameterization for a large class of problems in sequential recommendations.
Author Information
Georgios Theocharous (Adobe Research)
Zheng Wen (Adobe Research)
Zheng Wen is currently a senior research scientist at Big Data Experience Lab, Adobe Research. His current research focuses on machine learning, operations research, and big data. Before joining Adobe Research, he was a research scientist in Advertising Science Team, Yahoo Labs. Prior to that, he received a Ph.D. in Electrical Engineering from Stanford University.
Yasin Abbasi Yadkori (Adobe Research)
Nikos Vlassis (Netflix)
More from the Same Authors
-
2021 : Off-Policy Evaluation with Embedded Spaces »
Jaron Jia Rong Lee · David Arbour · Georgios Theocharous -
2022 : Trajectory-based Explainability Framework for Offline RL »
Shripad Deshmukh · Arpan Dasgupta · Chirag Agarwal · Nan Jiang · Balaji Krishnamurthy · Georgios Theocharous · Jayakumar Subramanian -
2023 Poster: Context-lumpable stochastic bandits »
Chung-Wei Lee · Qinghua Liu · Yasin Abbasi Yadkori · Chi Jin · Tor Lattimore · Csaba Szepesvari -
2021 Poster: Control Variates for Slate Off-Policy Evaluation »
Nikos Vlassis · Ashok Chandrashekar · Fernando Amat · Nathan Kallus -
2020 Poster: Model Selection in Contextual Stochastic Bandit Problems »
Aldo Pacchiano · My Phan · Yasin Abbasi Yadkori · Anup Rao · Julian Zimmert · Tor Lattimore · Csaba Szepesvari -
2020 Poster: Towards Safe Policy Improvement for Non-Stationary MDPs »
Yash Chandak · Scott Jordan · Georgios Theocharous · Martha White · Philip Thomas -
2020 Spotlight: Towards Safe Policy Improvement for Non-Stationary MDPs »
Yash Chandak · Scott Jordan · Georgios Theocharous · Martha White · Philip Thomas -
2019 Poster: Thompson Sampling and Approximate Inference »
My Phan · Yasin Abbasi Yadkori · Justin Domke -
2019 Poster: Bootstrapping Upper Confidence Bound »
Botao Hao · Yasin Abbasi Yadkori · Zheng Wen · Guang Cheng -
2017 Poster: Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback »
Zheng Wen · Branislav Kveton · Michal Valko · Sharan Vaswani -
2017 Poster: Near Minimax Optimal Players for the Finite-Time 3-Expert Prediction Problem »
Yasin Abbasi Yadkori · Peter Bartlett · Victor Gabillon -
2017 Poster: Conservative Contextual Linear Bandits »
Abbas Kazerouni · Mohammad Ghavamzadeh · Yasin Abbasi · Benjamin Van Roy -
2016 Poster: A posteriori error bounds for joint matrix decomposition problems »
Nicolò Colombo · Nikos Vlassis -
2015 Workshop: Machine Learning From and For Adaptive User Technologies: From Active Learning & Experimentation to Optimization & Personalization »
Joseph Jay Williams · Yasin Abbasi Yadkori · Finale Doshi-Velez -
2015 Workshop: Machine Learning for (e-)Commerce »
Esteban Arcaute · Mohammad Ghavamzadeh · Shie Mannor · Georgios Theocharous -
2015 Poster: Policy Evaluation Using the Ω-Return »
Philip Thomas · Scott Niekum · Georgios Theocharous · George Konidaris -
2015 Poster: Minimax Time Series Prediction »
Wouter Koolen · Alan Malek · Peter Bartlett · Yasin Abbasi Yadkori -
2014 Workshop: Large-scale reinforcement learning and Markov decision problems »
Benjamin Van Roy · Mohammad Ghavamzadeh · Peter Bartlett · Yasin Abbasi Yadkori · Ambuj Tewari -
2014 Workshop: From Bad Models to Good Policies (Sequential Decision Making under Uncertainty) »
Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor · Jeremie Mary · Laurent Orseau · Thomas Dietterich · Ronald Ortner · Peter Grünwald · Joelle Pineau · Raphael Fonteneau · Georgios Theocharous · Esteban D Arcaute · Christos Dimitrakakis · Nan Jiang · Doina Precup · Pierre-Luc Bacon · Marek Petrik · Aviv Tamar -
2013 Workshop: Resource-Efficient Machine Learning »
Yevgeny Seldin · Yasin Abbasi Yadkori · Yacov Crammer · Ralf Herbrich · Peter Bartlett -
2013 Poster: Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions »
Yasin Abbasi Yadkori · Peter Bartlett · Varun Kanade · Yevgeny Seldin · Csaba Szepesvari -
2011 Poster: Improved Algorithms for Linear Stochastic Bandits »
Yasin Abbasi Yadkori · David Pal · Csaba Szepesvari -
2011 Spotlight: Improved Algorithms for Linear Stochastic Bandits »
Yasin Abbasi Yadkori · David Pal · Csaba Szepesvari -
2006 Poster: Accelerated Variational Dirichlet Process Mixtures »
Kenichi Kurihara · Max Welling · Nikos Vlassis -
2006 Spotlight: Accelerated Variational Dirichlet Process Mixtures »
Kenichi Kurihara · Max Welling · Nikos Vlassis