Timezone: »
We propose to demonstrate a suite of software and robotics projects for reinforcement learning (RL) that have been in development for several years in Dr. Richard Sutton's group at the University of Alberta. Specifically, the three projects that we intend to showcase are: RL-Glue interface, the CritterBot robotic platform, and RL-Viz experimentation platform. The demonstration will illustrate how these projects allow researchers to develop learning agents that can be evaluated in a graphical simulation (RL-Viz) and on a mobile robot (CritterBot). RL-Glue is a language- and platform-independent protocol for evaluating reinforcement learning agents with environment programs. RL-Glue separates the agent- and environment-development process so that each can be written in different languages and even executed over the Internet from different computers. RL-Glue have had a significant influence on the way empirical comparisons are done in reinforcement learning. RL-Glue has been used to evaluate agents in four international competitions at high profile machine learning conferences. The most recent competition, held in conjunction with ICML 08, attracted over 150 teams. The final test phase of the competition included over 20 teams comprised of more than 40 participants. RL-Glue has been used by several university instructors, in several countries, to teach reinforcement learning. Several researchers have used RL-Glue to benchmark their agents in papers published in top international conferences, including NIPS. The CritterBot is an ongoing project at the University of Alberta whose goal is to add a further robotics effort to challenge, direct, and inspire the research on grounded artificial intelligence. This robot is small and mobile and outfitted with an unusually rich set of sensors, including sensors for touch, acceleration, motion, sound, vision, and several kinds of proximity. The initial objective is for the robot to form an extended multi-level model of the relationships among its sensors and between its sensors and its actuators. We have proposed that higher-level knowledge can be grounded in raw data of sensations and actions; this robotic platform will challenge and inspire us to see if it can really be done. We also plan to use this platform as a test case for rapid learning and for the use of reinforcement learning by non-experts. We would like a person whose has no training to be able to teach the system new ways of behaving in an intuitive manner much as one might train a particularly cooperative dog. Learning agents can interact with the CritterBot through RL-Glue just like with any other RL-Glue environment. RL-Viz provides the reinforcement learning community for the first time ever with a flexible, general, standardized, cross language and cross platform protocol/framework for managing and visualizing the interaction between agents and environments in reinforcement learning experiments. The RL-Viz project includes several state-of-the-art tasks used in learning research including, Tetris, a remote-controlled helicopter simulator provided by Andrew Ng's team at Stanford, keep-away soccer and a real-time strategy engine. RL-Viz is a protocol and library layered on top of RL-Glue. RL-Viz supports advanced features such as visualization of environments and agents and run-time loading of agents and environments. The software for most recent RL competition (mentioned above) was based on RL-Viz. We will present the latest developments in the RL-Glue project and demonstrate how RL-Glue provides a novel, unified architecture for developing reinforcement learning algorithms for simulation and physical experiments. This framework makes it easier to compare the performance of agents in a variety of simulated and physical tasks.
Author Information
Brian Tanner (University of Alberta)
Adam M White (University of Alberta; DeepMind)
Richard Sutton (DeepMind, U Alberta)
Richard S. Sutton is a professor and iCORE chair in the department of computing science at the University of Alberta. He is a fellow of the Association for the Advancement of Artificial Intelligence and co-author of the textbook "Reinforcement Learning: An Introduction" from MIT Press. Before joining the University of Alberta in 2003, he worked in industry at AT&T and GTE Labs, and in academia at the University of Massachusetts. He received a PhD in computer science from the University of Massachusetts in 1984 and a BA in psychology from Stanford University in 1978. Rich's research interests center on the learning problems facing a decision-maker interacting with its environment, which he sees as central to artificial intelligence. He is also interested in animal learning psychology, in connectionist networks, and generally in systems that continually improve their representations and models of the world.
More from the Same Authors
-
2022 : On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly-Communicating MDPs »
Yi Wan · Richard Sutton -
2022 Poster: Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions »
Tian Tian · Kenny Young · Richard Sutton -
2021 Poster: Average-Reward Learning and Planning with Options »
Yi Wan · Abhishek Naik · Rich Sutton -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Panel Discussion »
Richard Sutton · Doina Precup -
2019 : Panel Discussion led by Grace Lindsay »
Grace Lindsay · Blake Richards · Doina Precup · Jacqueline Gottlieb · Jeff Clune · Jane Wang · Richard Sutton · Angela Yu · Ida Momennejad -
2019 : Invited Talk #7: Richard Sutton »
Richard Sutton -
2016 : Richard Sutton (University of Alberta) »
Richard Sutton -
2016 : Rich Sutton »
Richard Sutton -
2015 Tutorial: Introduction to Reinforcement Learning with Function Approximation »
Richard Sutton -
2014 Workshop: Representation and Learning Methods for Complex Outputs »
Richard Zemel · Dale Schuurmans · Kilian Q Weinberger · Yuhong Guo · Jia Deng · Francesco Dinuzzo · Hal Daumé III · Honglak Lee · Noah A Smith · Richard Sutton · Jiaqian YU · Vitaly Kuznetsov · Luke Vilnis · Hanchen Xiong · Calvin Murdock · Thomas Unterthiner · Jean-Francis Roy · Martin Renqiang Min · Hichem SAHBI · Fabio Massimo Zanzotto -
2014 Poster: Universal Option Models »
hengshuai yao · Csaba Szepesvari · Richard Sutton · Joseph Modayil · Shalabh Bhatnagar -
2014 Poster: Weighted importance sampling for off-policy learning with linear function approximation »
Rupam Mahmood · Hado P van Hasselt · Richard Sutton -
2011 Invited Talk: Learning About Sensorimotor Data »
Richard Sutton -
2010 Poster: Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains »
Martha White · Adam M White -
2009 Poster: Multi-Step Dyna Planning for Policy Evaluation and Control »
Hengshuai Yao · Richard Sutton · Shalabh Bhatnagar · Dongcui Diao · Csaba Szepesvari -
2009 Poster: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation »
Hamid R Maei · Csaba Szepesvari · Shalabh Batnaghar · Doina Precup · David Silver · Richard Sutton -
2009 Spotlight: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation »
Hamid R Maei · Csaba Szepesvari · Shalabh Batnaghar · Doina Precup · David Silver · Richard Sutton -
2008 Poster: A computational model of hippocampal function in trace conditioning »
Elliot A Ludvig · Richard Sutton · Eric Verbeek · James Kehoe -
2008 Poster: A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approxi »
Richard Sutton · Csaba Szepesvari · Hamid R Maei -
2007 Spotlight: Incremental Natural Actor-Critic Algorithms »
Shalabh Bhatnagar · Richard Sutton · Mohammad Ghavamzadeh · Mark P Lee -
2007 Poster: Incremental Natural Actor-Critic Algorithms »
Shalabh Bhatnagar · Richard Sutton · Mohammad Ghavamzadeh · Mark P Lee -
2006 Workshop: The First Annual Reinforcement Learning Competition »
Adam M White -
2006 Workshop: Grounding Perception, Knowledge and Cognition in Sensori-Motor Experience »
Michael James · David Wingate · Brian Tanner -
2006 Poster: iLSTD: Convergence, Eligibility Traces, and Mountain Car »
Alborz Geramifard · Michael Bowling · Martin A Zinkevich · Richard Sutton