Timezone: »
Poster
Approximate Dynamic Programming Finally Performs Well in the Game of Tetris
Victor Gabillon · Mohammad Ghavamzadeh · Bruno Scherrer
Sun Dec 08 02:00 PM -- 06:00 PM (PST) @ Harrah's Special Events Center, 2nd Floor
Tetris is a popular video game that has been widely used as a benchmark for various optimization techniques including approximate dynamic programming (ADP) algorithms. A close look at the literature of this game shows that while ADP algorithms, that have been (almost) entirely based on approximating the value function (value function based), have performed poorly in Tetris, the methods that search directly in the space of policies by learning the policy parameters using an optimization black box, such as the cross entropy (CE) method, have achieved the best reported results. This makes us conjecture that Tetris is a game in which good policies are easier to represent, and thus, learn than their corresponding value functions. So, in order to obtain a good performance with ADP, we should use ADP algorithms that search in a policy space, instead of the more traditional ones that search in a value function space. In this paper, we put our conjecture to test by applying such an ADP algorithm, called classification-based modified policy iteration (CBMPI), to the game of Tetris. Our extensive experimental results show that for the first time an ADP algorithm, namely CBMPI, obtains the best results reported in the literature for Tetris in both small $10\times 10$ and large $10\times 20$ boards. Although the CBMPI's results are similar to those achieved by the CE method in the large board, CBMPI uses considerably fewer (almost 1/10) samples (call to the generative model of the game) than CE.
Author Information
Victor Gabillon (INRIA)
Mohammad Ghavamzadeh (Facebook AI Research)
Bruno Scherrer (INRIA)
More from the Same Authors
-
2019 Workshop: Safety and Robustness in Decision-making »
Mohammad Ghavamzadeh · Shie Mannor · Yisong Yue · Marek Petrik · Yinlam Chow -
2019 Poster: Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies »
Yonathan Efroni · Nadav Merlis · Mohammad Ghavamzadeh · Shie Mannor -
2019 Spotlight: Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies »
Yonathan Efroni · Nadav Merlis · Mohammad Ghavamzadeh · Shie Mannor -
2018 Poster: A Lyapunov-based Approach to Safe Reinforcement Learning »
Yinlam Chow · Ofir Nachum · Edgar Duenez-Guzman · Mohammad Ghavamzadeh -
2018 Poster: A Block Coordinate Ascent Algorithm for Mean-Variance Optimization »
Tengyang Xie · Bo Liu · Yangyang Xu · Mohammad Ghavamzadeh · Yinlam Chow · Daoming Lyu · Daesub Yoon -
2017 Poster: Conservative Contextual Linear Bandits »
Abbas Kazerouni · Mohammad Ghavamzadeh · Yasin Abbasi · Benjamin Van Roy -
2016 Poster: Safe Policy Improvement by Minimizing Robust Baseline Regret »
Mohammad Ghavamzadeh · Marek Petrik · Yinlam Chow -
2015 Workshop: Machine Learning for (e-)Commerce »
Esteban Arcaute · Mohammad Ghavamzadeh · Shie Mannor · Georgios Theocharous -
2015 Poster: Policy Gradient for Coherent Risk Measures »
Aviv Tamar · Yinlam Chow · Mohammad Ghavamzadeh · Shie Mannor -
2014 Workshop: Large-scale reinforcement learning and Markov decision problems »
Benjamin Van Roy · Mohammad Ghavamzadeh · Peter Bartlett · Yasin Abbasi Yadkori · Ambuj Tewari -
2014 Poster: Algorithms for CVaR Optimization in MDPs »
Yinlam Chow · Mohammad Ghavamzadeh -
2013 Poster: Actor-Critic Algorithms for Risk-Sensitive MDPs »
Prashanth L.A. · Mohammad Ghavamzadeh -
2013 Oral: Actor-Critic Algorithms for Risk-Sensitive MDPs »
Prashanth L.A. · Mohammad Ghavamzadeh -
2013 Poster: Improved and Generalized Upper Bounds on the Complexity of Policy Iteration »
Bruno Scherrer -
2013 Poster: Adaptive Submodular Maximization in Bandit Setting »
Victor Gabillon · Branislav Kveton · Zheng Wen · Brian Eriksson · S. Muthukrishnan -
2012 Poster: On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes »
Bruno Scherrer · Boris Lesner -
2012 Poster: Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence »
Victor Gabillon · Mohammad Ghavamzadeh · Alessandro Lazaric -
2012 Oral: On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes »
Bruno Scherrer · Boris Lesner -
2011 Poster: Multi-Bandit Best Arm Identification »
Victor Gabillon · Mohammad Ghavamzadeh · Alessandro Lazaric · Sebastien Bubeck -
2011 Poster: Speedy Q-Learning »
Mohammad Gheshlaghi Azar · Remi Munos · Mohammad Ghavamzadeh · Hilbert J Kappen -
2010 Spotlight: LSTD with Random Projections »
Mohammad Ghavamzadeh · Alessandro Lazaric · Odalric-Ambrym Maillard · Remi Munos -
2010 Poster: LSTD with Random Projections »
Mohammad Ghavamzadeh · Alessandro Lazaric · Odalric-Ambrym Maillard · Remi Munos -
2008 Workshop: Model Uncertainty and Risk in Reinforcement Learning »
Yaakov Engel · Mohammad Ghavamzadeh · Shie Mannor · Pascal Poupart -
2008 Poster: Regularized Policy Iteration »
Amir-massoud Farahmand · Mohammad Ghavamzadeh · Csaba Szepesvari · Shie Mannor -
2008 Poster: Biasing Approximate Dynamic Programming with a Lower Discount Factor »
Marek Petrik · Bruno Scherrer -
2007 Spotlight: Incremental Natural Actor-Critic Algorithms »
Shalabh Bhatnagar · Richard Sutton · Mohammad Ghavamzadeh · Mark P Lee -
2007 Poster: Incremental Natural Actor-Critic Algorithms »
Shalabh Bhatnagar · Richard Sutton · Mohammad Ghavamzadeh · Mark P Lee -
2006 Poster: Bayesian Policy Gradient Algorithms »
Mohammad Ghavamzadeh · Yaakov Engel -
2006 Spotlight: Bayesian Policy Gradient Algorithms »
Mohammad Ghavamzadeh · Yaakov Engel