Timezone: »
AlphaZero, an approach to reinforcement learning that couples neural networks and Monte Carlo tree search (MCTS), has produced state-of-the-art strategies for traditional board games like chess, Go, shogi, and Hex. While researchers and game commentators have suggested that AlphaZero uses concepts that humans consider important, it is unclear how these concepts are captured in the network. We investigate AlphaZero's internal representations in the game of Hex using two evaluation techniques from natural language processing (NLP): model probing and behavioral tests. In doing so, we introduce several new evaluation tools to the RL community, and illustrate how evaluations other than task performance can be used to provide a more complete picture of a model's strengths and weaknesses. Our analyses in the game of Hex reveal interesting patterns and generate some testable hypotheses about how such models learn in general. For example, we find that the MCTS discovers concepts before the neural network learns to encode them. We also find that concepts related to short-term end-game planning are best encoded in the final layers of the model, whereas concepts related to long-term planning are encoded in the middle layers of the model.
Author Information
Charles Lovering (Brown University)
Jessica Forde (Brown University)
George Konidaris (Brown University)
Ellie Pavlick (Brown University)
Michael Littman (Brown University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Evaluation beyond Task Performance: Analyzing Concepts in AlphaZero in Hex »
Wed. Nov 30th 05:00 -- 07:00 PM Room Hall J #613
More from the Same Authors
-
2021 : Bayesian Exploration for Lifelong Reinforcement Learning »
Haotian Fu · Shangqun Yu · Michael Littman · George Konidaris -
2023 Poster: Break It Down: Evidence for Structural Compositionality in Neural Networks »
Michael Lepori · Thomas Serre · Ellie Pavlick -
2023 Poster: Effectively Learning Initiation Sets in Hierarchical Reinforcement Learning »
Akhil Bagaria · Ben Abbatematteo · Omer Gottesman · Matt Corsaro · Sreehari Rammohan · George Konidaris -
2023 Workshop: Attributing Model Behavior at Scale (ATTRIB) »
Tolga Bolukbasi · Logan Engstrom · Kelvin Guu · Andrew Ilyas · Sung Min Park · Ellie Pavlick · Anders Søgaard -
2022 Poster: Effects of Data Geometry in Early Deep Learning »
Saket Tiwari · George Konidaris -
2022 Poster: Faster Deep Reinforcement Learning with Slower Online Network »
Kavosh Asadi · Rasool Fakoor · Omer Gottesman · Taesup Kim · Michael Littman · Alexander Smola -
2022 Poster: Model-based Lifelong Reinforcement Learning with Bayesian Exploration »
Haotian Fu · Shangqun Yu · Michael Littman · George Konidaris -
2021 : TD | Panel Discussion »
Thomas Gilbert · Ayse Yasar · Rachel Thomas · Mason Kortz · Frank Pasquale · Jessica Forde -
2021 : George Konidaris Talk Q&A »
George Konidaris -
2021 : Invited Talk: George Konidaris - Signal to Symbol (via Skills) »
George Konidaris -
2021 Workshop: I (Still) Can't Believe It's Not Better: A workshop for “beautiful” ideas that "should" have worked »
Aaron Schein · Melanie F. Pradier · Jessica Forde · Stephanie Hyland · Francisco Ruiz -
2021 Poster: On the Expressivity of Markov Reward »
David Abel · Will Dabney · Anna Harutyunyan · Mark Ho · Michael Littman · Doina Precup · Satinder Singh -
2021 Poster: Learning Markov State Abstractions for Deep Reinforcement Learning »
Cameron Allen · Neev Parikh · Omer Gottesman · George Konidaris -
2021 Poster: Hyperparameter Optimization Is Deceiving Us, and How to Stop It »
A. Feder Cooper · Yucheng Lu · Jessica Forde · Christopher De Sa -
2021 Oral: On the Expressivity of Markov Reward »
David Abel · Will Dabney · Anna Harutyunyan · Mark Ho · Michael Littman · Doina Precup · Satinder Singh -
2020 : Panel Discussions »
Grace Lindsay · George Konidaris · Shakir Mohamed · Kimberly Stachenfeld · Peter Dayan · Yael Niv · Doina Precup · Catherine Hartley · Ishita Dasgupta -
2020 : Invited Talk #4 QnA - George Konidaris »
George Konidaris · Raymond Chua · Feryal Behbahani -
2020 : Invited Talk #4 George Konidaris - Signal to Symbol (via Skills) »
George Konidaris -
2020 Workshop: I Can’t Believe It’s Not Better! Bridging the gap between theory and empiricism in probabilistic machine learning »
Jessica Forde · Francisco Ruiz · Melanie Fernandez Pradier · Aaron Schein · Finale Doshi-Velez · Isabel Valera · David Blei · Hanna Wallach -
2020 : Jessica Zosa Forde - Build, Start, Run, Push: Computational Registration of ML Experiments »
Jessica Forde -
2020 Workshop: ML Retrospectives, Surveys & Meta-Analyses (ML-RSA) »
Chhavi Yadav · Prabhu Pradhan · Jesse Dodge · Mayoore Jaiswal · Peter Henderson · Abhishek Gupta · Ryan Lowe · Jessica Forde · Joelle Pineau -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2019 Workshop: Retrospectives: A Venue for Self-Reflection in ML Research »
Ryan Lowe · Yoshua Bengio · Joelle Pineau · Michela Paganini · Jessica Forde · Shagun Sodhani · Abhishek Gupta · Joel Lehman · Peter Henderson · Kanika Madan · Koustuv Sinha · Xavier Bouthillier -
2018 : Poster Session 1 (note there are numerous missing names here, all papers appear in all poster sessions) »
Akhilesh Gotmare · Kenneth Holstein · Jan Brabec · Michal Uricar · Kaleigh Clary · Cynthia Rudin · Sam Witty · Andrew Ross · Shayne O'Brien · Babak Esmaeili · Jessica Forde · Massimo Caccia · Ali Emami · Scott Jordan · Bronwyn Woods · D. Sculley · Rebekah Overdorf · Nicolas Le Roux · Peter Henderson · Brandon Yang · Tzu-Yu Liu · David Jensen · Niccolo Dalmasso · Weitang Liu · Paul Marc TRICHELAIR · Jun Ki Lee · Akanksha Atrey · Matt Groh · Yotam Hechtlinger · Emma Tosch -
2018 Demonstration: Reproducing Machine Learning Research on Binder »
Jessica Forde · Tim Head · Chris Holdgraf · M Pacer · Félix-Antoine Fortin · Fernando Perez -
2017 Poster: Active Exploration for Learning Symbolic Representations »
Garrett Andersen · George Konidaris -
2017 Poster: Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes »
Taylor Killian · Samuel Daulton · Finale Doshi-Velez · George Konidaris -
2017 Oral: Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes »
Taylor Killian · Samuel Daulton · Finale Doshi-Velez · George Konidaris -
2016 Workshop: The Future of Interactive Machine Learning »
Kory Mathewson @korymath · Kaushik Subramanian · Mark Ho · Robert Loftin · Joseph L Austerweil · Anna Harutyunyan · Doina Precup · Layla El Asri · Matthew Gombolay · Jerry Zhu · Sonia Chernova · Charles Isbell · Patrick M Pilarski · Weng-Keen Wong · Manuela Veloso · Julie A Shah · Matthew Taylor · Brenna Argall · Michael Littman -
2016 Oral: Showing versus doing: Teaching by demonstration »
Mark Ho · Michael Littman · James MacGlashan · Fiery Cushman · Joseph L Austerweil -
2016 Poster: Showing versus doing: Teaching by demonstration »
Mark Ho · Michael Littman · James MacGlashan · Fiery Cushman · Joe Austerweil · Joseph L Austerweil -
2015 Poster: Policy Evaluation Using the Ω-Return »
Philip Thomas · Scott Niekum · Georgios Theocharous · George Konidaris -
2013 Demonstration: Di-BOSS™: Digital Building Operating System Solution »
Jessica Forde · Vivek Rathod · Hooshmand Shookri · Vaibhav Bandari · Ashwath Rajan · John Min · Ariel Fan · Leon Wu · Ashish Gagneja · Doug Riecken · David Solomon · Lauren Hannah · Albert Boulanger · Roger Anderson -
2011 Poster: TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning »
George Konidaris · Scott Niekum · Philip Thomas -
2010 Poster: Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories »
George Konidaris · Scott R Kuindersma · Andrew G Barto · Roderic A Grupen -
2009 Poster: Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining »
George Konidaris · Andrew G Barto -
2009 Spotlight: Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining »
George Konidaris · Andrew G Barto