Timezone: »

Izzeddin Gur · Ofir Nachum · Aleksandra Faust
Event URL: https://openreview.net/forum?id=GBiTbPTOu9a »

In reinforcement learning (RL) the use of simulators is ubiquitous, allowing cheaper and safer agent training than training directly in the real target environment. However, this approach relies on the simulator being a sufficiently accurate reflection of the target environment, which is difficult to achieve in practice. Accordingly, recent methods have proposed an alternative paradigm, utilizing offline datasets from the target environment to train an agent, avoiding online access to either the target or any simulated environment but leading to poor generalization outside the support of the offline data. Here, we propose to combine these two paradigms to leverage both offline datasets and synthetic simulators. We formalize our approach as offline targeted environment design(OTED), which automatically learns a distribution over simulator parameters to match a provided offline dataset, and then uses the learned simulator to train an RL agent in standard online fashion. We derive an objective for learning the simulator parameters which corresponds to minimizing a divergence between the target offline dataset and the state-action distribution induced by the simulator. We evaluate our method on standard offlineRL benchmarks and show that it yields impressive results compared to existing approaches, thus successfully leveraging both offline datasets and simulators for better RL.

Author Information

Izzeddin Gur (Google)
Ofir Nachum (Google Brain)
Aleksandra Faust (Google Brain)

Aleksandra Faust is a Senior Research Engineer at Google Brain, specializing in robot intelligence. Previously, Aleksandra led machine learning efforts for self-driving car planning and controls in Waymo and Google X, and was a researcher in Sandia National Laboratories, where she worked on satellites and other remote sensing applications. She earned a Ph.D. in Computer Science at the University of New Mexico (with distinction), a Master’s in Computer Science from University of Illinois at Urbana-Champaign, and a Bachelor’s in Mathematics from University of Belgrade, Serbia. Her research interests include reinforcement learning, adaptive motion planning, and machine learning for decision-making. Aleksandra won Tom L. Popejoy Award for the best doctoral dissertation at the University of New Mexico in Engineering, Mathematics, and Sciences in the period of 2011-2014. She was also awarded with the Best Paper in Service Robotics at ICRA 2018, Sandia National Laboratories’ Doctoral Studies Program and New Mexico Space Grant fellowships, as well as the Outstanding Graduate Student in Computer Science award. Her work has been featured in the New York Times.​

More from the Same Authors