Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e.g. hours vs. days) because they parallelize better. However, many RL problems require directed exploration because they have reward functions that are sparse or deceptive (i.e. contain local optima), and it is unknown how to encourage such exploration with ES. Here we show that algorithms that have been invented to promote directed exploration in small-scale evolved neural networks via populations of exploring agents, specifically novelty search (NS) and quality diversity (QD) algorithms, can be hybridized with ES to improve its performance on sparse or deceptive deep RL tasks, while retaining scalability. Our experiments confirm that the resultant new algorithms, NS-ES and two QD algorithms, NSR-ES and NSRA-ES, avoid local optima encountered by ES to achieve higher performance on Atari and simulated robots learning to walk around a deceptive trap. This paper thus introduces a family of fast, scalable algorithms for reinforcement learning that are capable of directed exploration. It also adds this new family of exploration algorithms to the RL toolbox and raises the interesting possibility that analogous algorithms with multiple simultaneous paths of exploration might also combine well with existing RL algorithms outside ES.
Edoardo Conti (Facebook AML)
Vashisht Madhavan (Uber)
Felipe Petroski Such (Uber AI Labs)
Joel Lehman (Uber AI Labs)
Kenneth Stanley (Uber AI Labs and University of Central Florida)
Jeff Clune (Uber AI Labs)
Jeff is a senior research scientist and founding member of Uber AI Labs. He is also the Loy and Edith Harris Associate Professor in Computer Science University of Wyoming, where he directs the Evolving AI Lab (http://EvolvingAI.org). He researches robotics and creating artificial intelligence in neural networks, either via deep learning or evolutionary algorithms.
More from the Same Authors
2019 Workshop: Retrospectives: A Venue for Self-Reflection in ML Research »
Ryan Lowe · Yoshua Bengio · Joelle Pineau · Michela Paganini · Jessica Forde · Shagun Sodhani · Abhishek Gupta · Joel Lehman · Peter Henderson · Kanika Madan · Koustuv Sinha · Xavier Bouthillier
2018 Poster: An intriguing failing of convolutional neural networks and the CoordConv solution »
Rosanne Liu · Joel Lehman · Piero Molino · Felipe Petroski Such · Eric Frank · Alex Sergeev · Jason Yosinski