Timezone: »

 
CLUTR: Curriculum Learning via Unsupervised Task Representation Learning
Abdus Salam Azad · Izzeddin Gur · Aleksandra Faust · Pieter Abbeel · Ion Stoica
Event URL: https://openreview.net/forum?id=dC3JLf7yxV7 »

Reinforcement Learning (RL) algorithms are often known for sample inefficiency and difficult generalization. Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the sampled tasks. This is a non-stationary process where the task distribution evolves along with agent policies; creating an instability over time. While past works demonstrated the potential of such approaches, sampling effectively from the task space remains an open challenge, bottlenecking these approaches. To this end, we introduce CLUTR: a novel curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization. It first trains a recurrent variational autoencoder on randomly generated tasks to learn a latent task manifold. Next, a teacher agent creates a curriculum by optimizing a minimax REGRET-based objective on a set of latent tasks sampled from this manifold. By keeping the task manifold fixed, we show that CLUTR successfully overcomes the non-stationarity problem and improves stability. Our experimental results show CLUTR outperforms PAIRED, a principled and popular UED method, in terms of generalization and sample efficiency in the challenging CarRacing and navigation environments: showing an 18x improvement on the F1 CarRacing benchmark. CLUTR also performs comparably to the non-UED state-of-the-art for CarRacing, outperforming it in nine of the 20 tracks. CLUTR also achieves a 33% higher solved rate than PAIRED on a set of 18 out-of-distribution navigation tasks.

Author Information

Abdus Salam Azad (University of California Berkeley)
Izzeddin Gur (Google)
Aleksandra Faust (Google Brain)

Aleksandra Faust is a Senior Research Engineer at Google Brain, specializing in robot intelligence. Previously, Aleksandra led machine learning efforts for self-driving car planning and controls in Waymo and Google X, and was a researcher in Sandia National Laboratories, where she worked on satellites and other remote sensing applications. She earned a Ph.D. in Computer Science at the University of New Mexico (with distinction), a Master’s in Computer Science from University of Illinois at Urbana-Champaign, and a Bachelor’s in Mathematics from University of Belgrade, Serbia. Her research interests include reinforcement learning, adaptive motion planning, and machine learning for decision-making. Aleksandra won Tom L. Popejoy Award for the best doctoral dissertation at the University of New Mexico in Engineering, Mathematics, and Sciences in the period of 2011-2014. She was also awarded with the Best Paper in Service Robotics at ICRA 2018, Sandia National Laboratories’ Doctoral Studies Program and New Mexico Space Grant fellowships, as well as the Outstanding Graduate Student in Computer Science award. Her work has been featured in the New York Times.​

Pieter Abbeel (UC Berkeley & Covariant)

Pieter Abbeel is Professor and Director of the Robot Learning Lab at UC Berkeley [2008- ], Co-Director of the Berkeley AI Research (BAIR) Lab, Co-Founder of covariant.ai [2017- ], Co-Founder of Gradescope [2014- ], Advisor to OpenAI, Founding Faculty Partner AI@TheHouse venture fund, Advisor to many AI/Robotics start-ups. He works in machine learning and robotics. In particular his research focuses on making robots learn from people (apprenticeship learning), how to make robots learn through their own trial and error (reinforcement learning), and how to speed up skill acquisition through learning-to-learn (meta-learning). His robots have learned advanced helicopter aerobatics, knot-tying, basic assembly, organizing laundry, locomotion, and vision-based robotic manipulation. He has won numerous awards, including best paper awards at ICML, NIPS and ICRA, early career awards from NSF, Darpa, ONR, AFOSR, Sloan, TR35, IEEE, and the Presidential Early Career Award for Scientists and Engineers (PECASE). Pieter's work is frequently featured in the popular press, including New York Times, BBC, Bloomberg, Wall Street Journal, Wired, Forbes, Tech Review, NPR.

Ion Stoica (UC Berkeley)

More from the Same Authors