Timezone: »
Offline reinforcement learning algorithms can train performant policies for hard tasks using previously-collected datasets. However, the quality of the offline dataset often limits the levels of performance possible. We consider the problem of improving offline policies through online fine-tuning. Offline RL requires a pessimistic training objective to mitigate distributional shift between the trained policy and the offline behavior policy, which will make the trained policy averse to picking novel actions. In contrast, online RL requires exploration, or optimism. Thus, fine-tuning online policies with the offline training objective is not ideal. Additionally, loosening the fine-tuning objective to allow for more exploration can potentially destroy the behaviors learned in the offline phase because of the sudden and significant change in the optimization objective. To mitigate this challenge, we propose a method to facilitate exploration during online fine-tuning that maintains the same training objective throughout both offline and online phases, while encouraging exploration. We accomplish this by changing the action-selection method to be more optimistic with respect to the Q-function. By choosing to take actions in the environment with higher expected Q-values, our method is able to explore and improve behaviors more efficiently, obtaining 56% more returns on average than the alternative approaches on several locomotion, navigation, and manipulation tasks.
Author Information
Max Sobol Mark (Stanford)
Ali Ghadirzadeh (Stanford University)
Xi Chen (Tsinghua University)
Chelsea Finn (Stanford)
More from the Same Authors
-
2021 Spotlight: Efficiently Identifying Task Groupings for Multi-Task Learning »
Chris Fifty · Ehsan Amid · Zhe Zhao · Tianhe Yu · Rohan Anil · Chelsea Finn -
2021 : MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance »
Michael Luo · Ashwin Balakrishna · Brijen Thananjeyan · Suraj Nair · Julian Ibarz · Jie Tan · Chelsea Finn · Ion Stoica · Ken Goldberg -
2021 : Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets »
Frederik Ebert · Yanlai Yang · Karl Schmeckpeper · Bernadette Bucher · Kostas Daniilidis · Chelsea Finn · Sergey Levine -
2021 : Lifelong Robotic Reinforcement Learning by Retaining Experiences »
Annie Xie · Chelsea Finn -
2021 : Correct-N-Contrast: A Contrastive Approach for Improving Robustness to Spurious Correlations »
Michael Zhang · Nimit Sohoni · Hongyang Zhang · Chelsea Finn · Christopher Ré -
2021 : Extending the WILDS Benchmark for Unsupervised Adaptation »
Shiori Sagawa · Pang Wei Koh · Tony Lee · Irena Gao · Sang Michael Xie · Kendrick Shen · Ananya Kumar · Weihua Hu · Michihiro Yasunaga · Henrik Marklund · Sara Beery · Ian Stavness · Jure Leskovec · Kate Saenko · Tatsunori Hashimoto · Sergey Levine · Chelsea Finn · Percy Liang -
2021 : Test Time Robustification of Deep Models via Adaptation and Augmentation »
Marvin Zhang · Sergey Levine · Chelsea Finn -
2021 : The Reflective Explorer: Online Meta-Exploration from Offline Data in Realistic Robotic Tasks »
Rafael Rafailov · · Tianhe Yu · Avi Singh · Mariano Phielipp · Chelsea Finn -
2021 : Data Sharing without Rewards in Multi-Task Offline Reinforcement Learning »
Tianhe Yu · Aviral Kumar · Yevgen Chebotar · Chelsea Finn · Sergey Levine · Karol Hausman -
2021 : CoMPS: Continual Meta Policy Search »
Glen Berseth · Zhiwei Zhang · Grace Zhang · Chelsea Finn · Sergey Levine -
2021 : Discriminator Augmented Model-Based Reinforcement Learning »
Allan Zhou · Archit Sharma · Chelsea Finn -
2021 : Curiosity with Chelsea Finn, Celeste Kidd, Timothy Verstynen »
Celeste Kidd · Chelsea Finn · Timothy Verstynen · Johnathan Flowers -
2021 : Example-Based Offline Reinforcement Learning without Rewards »
Kyle Hatch · Tianhe Yu · Rafael Rafailov · Chelsea Finn -
2021 : The Reflective Explorer: Online Meta-Exploration from Offline Data in Realistic Robotic Tasks »
Rafael Rafailov · · Tianhe Yu · Avi Singh · Mariano Phielipp · Chelsea Finn -
2022 Poster: LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning »
Xi Chen · Ali Ghadirzadeh · Tianhe Yu · Jianhao Wang · Alex Yuan Gao · Wenzhe Li · Liang Bin · Chelsea Finn · Chongjie Zhang -
2022 : You Only Live Once: Single-Life Reinforcement Learning »
Annie Chen · Archit Sharma · Sergey Levine · Chelsea Finn -
2022 : What Makes Certain Pre-Trained Visual Representations Better for Robotic Learning? »
Kyle Hsu · Tyler Lum · Ruohan Gao · Shixiang (Shane) Gu · Jiajun Wu · Chelsea Finn -
2022 : A Control-Centric Benchmark for Video Prediction »
Stephen Tian · Chelsea Finn · Jiajun Wu -
2022 : Pre-Training for Robots: Leveraging Diverse Multitask Data via Offline Reinforcement Learning »
Aviral Kumar · Anikait Singh · Frederik Ebert · Yanlai Yang · Chelsea Finn · Sergey Levine -
2022 : Train Offline, Test Online: A Real Robot Learning Benchmark »
Gaoyue Zhou · Victoria Dean · Mohan Kumar Srirama · Aravind Rajeswaran · Jyothish Pari · Kyle Hatch · Aryan Jain · Tianhe Yu · Pieter Abbeel · Lerrel Pinto · Chelsea Finn · Abhinav Gupta -
2022 : Bitrate-Constrained DRO: Beyond Worst Case Robustness To Unknown Group Shifts »
Amrith Setlur · Don Dennis · Benjamin Eysenbach · Aditi Raghunathan · Chelsea Finn · Virginia Smith · Sergey Levine -
2022 : Train Offline, Test Online: A Real Robot Learning Benchmark »
Gaoyue Zhou · Victoria Dean · Mohan Kumar Srirama · Aravind Rajeswaran · Jyothish Pari · Kyle Hatch · Aryan Jain · Tianhe Yu · Pieter Abbeel · Lerrel Pinto · Chelsea Finn · Abhinav Gupta -
2022 : Multi-Domain Long-Tailed Learning by Augmenting Disentangled Representations »
Huaxiu Yao · Xinyu Yang · Allan Zhou · Chelsea Finn -
2022 : Surgical Fine-Tuning Improves Adaptation to Distribution Shifts »
Yoonho Lee · Annie Chen · Fahim Tajwar · Ananya Kumar · Huaxiu Yao · Percy Liang · Chelsea Finn -
2022 : Contrastive Example-Based Control »
Kyle Hatch · Sarthak J Shetty · Benjamin Eysenbach · Tianhe Yu · Rafael Rafailov · Russ Salakhutdinov · Sergey Levine · Chelsea Finn -
2022 : Train Offline, Test Online: A Real Robot Learning Benchmark »
Gaoyue Zhou · Victoria Dean · Mohan Kumar Srirama · Aravind Rajeswaran · Jyothish Pari · Kyle Hatch · Aryan Jain · Tianhe Yu · Pieter Abbeel · Lerrel Pinto · Chelsea Finn · Abhinav Gupta -
2022 : Relaxing the Kolmogorov Structure Function for Realistic Computational Constraints »
Yoonho Lee · Chelsea Finn · Stefano Ermon -
2022 : Recommendation for New Drugs with Limited Prescription Data »
Zhenbang Wu · Huaxiu Yao · Zhe Su · David Liebovitz · Lucas Glass · James Zou · Chelsea Finn · Jimeng Sun -
2022 : What Makes Certain Pre-Trained Visual Representations Better for Robotic Learning? »
Kyle Hsu · Tyler Lum · Ruohan Gao · Shixiang (Shane) Gu · Jiajun Wu · Chelsea Finn -
2022 : Train Offline, Test Online: A Real Robot Learning Benchmark »
Gaoyue Zhou · Victoria Dean · Mohan Kumar Srirama · Aravind Rajeswaran · Jyothish Pari · Kyle Hatch · Aryan Jain · Tianhe Yu · Pieter Abbeel · Lerrel Pinto · Chelsea Finn · Abhinav Gupta -
2022 : Pre-Training for Robots: Leveraging Diverse Multitask Data via Offline Reinforcement Learning »
Anikait Singh · Aviral Kumar · Frederik Ebert · Yanlai Yang · Chelsea Finn · Sergey Levine -
2022 : Policy Architectures for Compositional Generalization in Control »
Allan Zhou · Vikash Kumar · Chelsea Finn · Aravind Rajeswaran -
2022 : Contrastive Example-Based Control »
Kyle Hatch · Sarthak J Shetty · Benjamin Eysenbach · Tianhe Yu · Rafael Rafailov · Russ Salakhutdinov · Sergey Levine · Chelsea Finn -
2022 : Giving Robots a Hand: Broadening Generalization via Hand-Centric Human Video Demonstrations »
Moo J Kim · Jiajun Wu · Chelsea Finn -
2022 : Surgical Fine-Tuning Improves Adaptation to Distribution Shifts »
Yoonho Lee · Annie Chen · Fahim Tajwar · Ananya Kumar · Huaxiu Yao · Percy Liang · Chelsea Finn -
2022 : Train Offline, Test Online: A Real Robot Learning Benchmark »
Gaoyue Zhou · Victoria Dean · Mohan Kumar Srirama · Aravind Rajeswaran · Jyothish Pari · Kyle Hatch · Aryan Jain · Tianhe Yu · Pieter Abbeel · Lerrel Pinto · Chelsea Finn · Abhinav Gupta -
2022 : Train Offline, Test Online: A Real Robot Learning Benchmark »
Gaoyue Zhou · Victoria Dean · Mohan Kumar Srirama · Aravind Rajeswaran · Jyothish Pari · Kyle Hatch · Aryan Jain · Tianhe Yu · Pieter Abbeel · Lerrel Pinto · Chelsea Finn · Abhinav Gupta -
2022 : Train Offline, Test Online: A Real Robot Learning Benchmark »
Gaoyue Zhou · Victoria Dean · Mohan Kumar Srirama · Aravind Rajeswaran · Jyothish Pari · Kyle Hatch · Aryan Jain · Tianhe Yu · Pieter Abbeel · Lerrel Pinto · Chelsea Finn · Abhinav Gupta -
2022 Workshop: Workshop on Distribution Shifts: Connecting Methods and Applications »
Chelsea Finn · Fanny Yang · Hongseok Namkoong · Masashi Sugiyama · Jacob Eisenstein · Jonas Peters · Rebecca Roelofs · Shiori Sagawa · Pang Wei Koh · Yoonho Lee -
2022 Poster: MEMO: Test Time Robustness via Adaptation and Augmentation »
Marvin Zhang · Sergey Levine · Chelsea Finn -
2022 Poster: Learning Options via Compression »
Yiding Jiang · Evan Liu · Benjamin Eysenbach · J. Zico Kolter · Chelsea Finn -
2022 Poster: You Only Live Once: Single-Life Reinforcement Learning »
Annie Chen · Archit Sharma · Sergey Levine · Chelsea Finn -
2022 Poster: Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time »
Huaxiu Yao · Caroline Choi · Bochuan Cao · Yoonho Lee · Pang Wei Koh · Chelsea Finn -
2022 Poster: When to Ask for Help: Proactive Interventions in Autonomous Reinforcement Learning »
Annie Xie · Fahim Tajwar · Archit Sharma · Chelsea Finn -
2022 Poster: C-Mixup: Improving Generalization in Regression »
Huaxiu Yao · Yiping Wang · Linjun Zhang · James Zou · Chelsea Finn -
2022 Poster: Giving Feedback on Interactive Student Programs with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2021 : Lifelong Robotic Reinforcement Learning by Retaining Experiences »
Annie Xie · Chelsea Finn -
2021 : Discussion: Chelsea Finn, Masashi Sugiyama »
Chelsea Finn · Masashi Sugiyama -
2021 : Robustness through the Lens of Invariance »
Chelsea Finn -
2021 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · David Silver · Matthew Taylor · Martha White · Srijita Das · Yuqing Du · Andrew Patterson · Manan Tomar · Olivia Watkins -
2021 Poster: Visual Adversarial Imitation Learning using Variational Models »
Rafael Rafailov · Tianhe Yu · Aravind Rajeswaran · Chelsea Finn -
2021 Poster: Efficiently Identifying Task Groupings for Multi-Task Learning »
Chris Fifty · Ehsan Amid · Zhe Zhao · Tianhe Yu · Rohan Anil · Chelsea Finn -
2021 Poster: COMBO: Conservative Offline Model-Based Policy Optimization »
Tianhe Yu · Aviral Kumar · Rafael Rafailov · Aravind Rajeswaran · Sergey Levine · Chelsea Finn -
2021 Poster: Information is Power: Intrinsic Control via Information Capture »
Nicholas Rhinehart · Jenny Wang · Glen Berseth · John Co-Reyes · Danijar Hafner · Chelsea Finn · Sergey Levine -
2021 Poster: Conservative Data Sharing for Multi-Task Offline Reinforcement Learning »
Tianhe Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Sergey Levine · Chelsea Finn -
2021 Poster: Meta-learning with an Adaptive Task Scheduler »
Huaxiu Yao · Yu Wang · Ying Wei · Peilin Zhao · Mehrdad Mahdavi · Defu Lian · Chelsea Finn -
2021 Poster: Noether Networks: meta-learning useful conserved quantities »
Ferran Alet · Dylan Doblar · Allan Zhou · Josh Tenenbaum · Kenji Kawaguchi · Chelsea Finn -
2021 Poster: Differentiable Annealed Importance Sampling and the Perils of Gradient Noise »
Guodong Zhang · Kyle Hsu · Jianing Li · Chelsea Finn · Roger Grosse -
2021 Poster: Autonomous Reinforcement Learning via Subgoal Curricula »
Archit Sharma · Abhishek Gupta · Sergey Levine · Karol Hausman · Chelsea Finn -
2021 Poster: Adaptive Risk Minimization: Learning to Adapt to Domain Shift »
Marvin Zhang · Henrik Marklund · Nikita Dhawan · Abhishek Gupta · Sergey Levine · Chelsea Finn -
2020 : Panel Discussion & Closing »
Yejin Choi · Alexei Efros · Chelsea Finn · Kristen Grauman · Quoc V Le · Yann LeCun · Ruslan Salakhutdinov · Eric Xing -
2020 : QA: Chelsea Finn »
Chelsea Finn -
2020 : Mini-panel discussion 3 - Prioritizing Real World RL Challenges »
Chelsea Finn · Thomas Dietterich · Angela Schoellig · Anca Dragan · Anusha Nagabandi · Doina Precup -
2020 : Invited Talk: Chelsea Finn »
Chelsea Finn -
2020 : Keynote: Chelsea Finn »
Chelsea Finn -
2020 : Invited talk - Underfitting and Uncertainty in Self-Supervised Predictive Models »
Chelsea Finn -
2020 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · Joelle Pineau · David Silver · Satinder Singh · Coline Devin · Misha Laskin · Kimin Lee · Janarthanan Rajendran · Vivek Veeriah -
2020 Poster: Weakly-Supervised Reinforcement Learning for Controllable Behavior »
Lisa Lee · Benjamin Eysenbach · Russ Salakhutdinov · Shixiang (Shane) Gu · Chelsea Finn -
2020 Poster: Continual Learning of Control Primitives : Skill Discovery via Reset-Games »
Kelvin Xu · Siddharth Verma · Chelsea Finn · Sergey Levine -
2020 Poster: Gradient Surgery for Multi-Task Learning »
Tianhe Yu · Saurabh Kumar · Abhishek Gupta · Sergey Levine · Karol Hausman · Chelsea Finn -
2020 Poster: Continuous Meta-Learning without Tasks »
James Harrison · Apoorva Sharma · Chelsea Finn · Marco Pavone -
2020 Poster: One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL »
Saurabh Kumar · Aviral Kumar · Sergey Levine · Chelsea Finn -
2020 Poster: Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors »
Karl Pertsch · Oleh Rybkin · Frederik Ebert · Shenghao Zhou · Dinesh Jayaraman · Chelsea Finn · Sergey Levine -
2020 Poster: MOPO: Model-based Offline Policy Optimization »
Tianhe Yu · Garrett Thomas · Lantao Yu · Stefano Ermon · James Zou · Sergey Levine · Chelsea Finn · Tengyu Ma -
2019 : Poster Presentations »
Rahul Mehta · Andrew Lampinen · Binghong Chen · Sergio Pascual-Diaz · Jordi Grau-Moya · Aldo Faisal · Jonathan Tompson · Yiren Lu · Khimya Khetarpal · Martin Klissarov · Pierre-Luc Bacon · Doina Precup · Thanard Kurutach · Aviv Tamar · Pieter Abbeel · Jinke He · Maximilian Igl · Shimon Whiteson · Wendelin Boehmer · Raphaël Marinier · Olivier Pietquin · Karol Hausman · Sergey Levine · Chelsea Finn · Tianhe Yu · Lisa Lee · Benjamin Eysenbach · Emilio Parisotto · Eric Xing · Ruslan Salakhutdinov · Hongyu Ren · Anima Anandkumar · Deepak Pathak · Christopher Lu · Trevor Darrell · Alexei Efros · Phillip Isola · Feng Liu · Bo Han · Gang Niu · Masashi Sugiyama · Saurabh Kumar · Janith Petangoda · Johan Ferret · James McClelland · Kara Liu · Animesh Garg · Robert Lange -
2019 Workshop: Learning with Rich Experience: Integration of Learning Paradigms »
Zhiting Hu · Andrew Wilson · Chelsea Finn · Lisa Lee · Taylor Berg-Kirkpatrick · Ruslan Salakhutdinov · Eric Xing -
2019 Poster: Language as an Abstraction for Hierarchical Deep Reinforcement Learning »
YiDing Jiang · Shixiang (Shane) Gu · Kevin Murphy · Chelsea Finn