Timezone: »
We formulate learning for control as an inverse problem - inverting a dynamical system to give the actions which yield desired behavior. The key challenge in this formulation is a distribution shift in the inputs to the function to be inverted - the learning agent can only observe the forward mapping (its actions' consequences) on trajectories that it can execute, yet must learn the inverse mapping for inputs-outputs that correspond to a different, desired behavior. We propose a general recipe for inverse problems with a distribution shift that we term $\textit{iterative inversion} - learn the inverse mapping under the current input distribution (policy), then use it on the desired output samples to obtain a new input distribution, and repeat.As we show, iterative inversion can converge to the desired inverse mapping, but under rather strict conditions on the mapping itself.We next apply iterative inversion to learn control. Our input is a set of demonstrations of desired behavior, given as video embeddings of trajectories (without actions), and our method iteratively learns to imitate trajectories generated by the current policy, perturbed by random exploration noise. We find that constantly adding the demonstrated trajectory embeddings as input to the policy when generating trajectories to imitate, a-la iterative inversion, we effectively steer the learning towards the desired trajectory distribution. To the best of our knowledge, this is the first exploration of learning control from the viewpoint of inverse problems, and the main advantage of our approach is simplicity - it does not require rewards, and only employs supervised learning, which can be easily scaled to use state-of-the-art trajectory embedding techniques and policy representations. Indeed, with a VQ-VAE embedding, and a transformer-based policy, we demonstrate non-trivial continuous control on several tasks. Further, we report an improved performance on imitating diverse behaviors compared to reward based methods.
Author Information
Gal Leibovich (Intel Labs)
Guy Jacob (Intel Labs)
Or Avner (Technion - Israel Institute of Technology, Technion)
Gal Novik (Intel Labs)
Aviv Tamar (Technion)
More from the Same Authors
-
2021 : Deep Variational Semi-Supervised Novelty Detection »
Tal Daniel · Thanard Kurutach · Aviv Tamar -
2021 : Validate on Sim, Detect on Real - Model Selection for Domain Randomization »
Guy Jacob · Gal Leibovich · Shadi Endrawis · Gal Novik · Aviv Tamar -
2021 : Validate on Sim, Detect on Real - Model Selection for Domain Randomization »
Guy Jacob · Gal Leibovich · Shadi Endrawis · Gal Novik · Aviv Tamar -
2021 : Deep Variational Semi-Supervised Novelty Detection »
Tal Daniel · Thanard Kurutach · Aviv Tamar -
2022 : Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement Learning »
Dan Elbaz · Gal Novik · Oren Salzman -
2022 Poster: Meta Reinforcement Learning with Finite Training Tasks - a Density Estimation Approach »
Zohar Rimon · Aviv Tamar · Gilad Adler -
2021 : Spotlights »
Hager Radi · Krishan Rana · Yunzhu Li · Shuang Li · Gal Leibovich · Guy Jacob · Ruihan Yang -
2021 Poster: Offline Meta Reinforcement Learning -- Identifiability Challenges and Effective Data Collection Strategies »
Ron Dorfman · Idan Shenfeld · Aviv Tamar -
2021 Poster: Iterative Causal Discovery in the Possible Presence of Latent Confounders and Selection Bias »
Raanan Rohekar · Shami Nisimov · Yaniv Gurwicz · Gal Novik -
2020 : Mini-panel discussion 1 - Bridging the gap between theory and practice »
Aviv Tamar · Emma Brunskill · Jost Tobias Springenberg · Omer Gottesman · Daniel Mankowitz -
2020 : Keynote: Aviv Tamar »
Aviv Tamar -
2019 : Poster Presentations »
Rahul Mehta · Andrew Lampinen · Binghong Chen · Sergio Pascual-Diaz · Jordi Grau-Moya · Aldo Faisal · Jonathan Tompson · Yiren Lu · Khimya Khetarpal · Martin Klissarov · Pierre-Luc Bacon · Doina Precup · Thanard Kurutach · Aviv Tamar · Pieter Abbeel · Jinke He · Maximilian Igl · Shimon Whiteson · Wendelin Boehmer · RaphaĆ«l Marinier · Olivier Pietquin · Karol Hausman · Sergey Levine · Chelsea Finn · Tianhe Yu · Lisa Lee · Benjamin Eysenbach · Emilio Parisotto · Eric Xing · Ruslan Salakhutdinov · Hongyu Ren · Anima Anandkumar · Deepak Pathak · Christopher Lu · Trevor Darrell · Alexei Efros · Phillip Isola · Feng Liu · Bo Han · Gang Niu · Masashi Sugiyama · Saurabh Kumar · Janith Petangoda · Johan Ferret · James McClelland · Kara Liu · Animesh Garg · Robert Lange -
2019 Poster: Modeling Uncertainty by Learning a Hierarchy of Deep Neural Connections »
Raanan Rohekar · Yaniv Gurwicz · Shami Nisimov · Gal Novik -
2018 Poster: Bayesian Structure Learning by Recursive Bootstrap »
Raanan Y. Rohekar · Yaniv Gurwicz · Shami Nisimov · Guy Koren · Gal Novik -
2018 Poster: Constructing Deep Neural Networks by Bayesian Network Structure Learning »
Raanan Rohekar · Shami Nisimov · Yaniv Gurwicz · Guy Koren · Gal Novik -
2017 Poster: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments »
Ryan Lowe · YI WU · Aviv Tamar · Jean Harb · OpenAI Pieter Abbeel · Igor Mordatch