`

Timezone: »

 
Poster
Active Offline Policy Selection
Ksenia Konyushova · Yutian Chen · Thomas Paine · CAGLAR Gulcehre · Cosmin Paduraru · Daniel J Mankowitz · Misha Denil · Nando de Freitas

Tue Dec 07 08:30 AM -- 10:00 AM (PST) @ Virtual #None

This paper addresses the problem of policy selection in domains with abundant logged data, but with a restricted interaction budget. Solving this problem would enable safe evaluation and deployment of offline reinforcement learning policies in industry, robotics, and recommendation domains among others. Several off-policy evaluation (OPE) techniques have been proposed to assess the value of policies using only logged data. However, there is still a big gap between the evaluation by OPE and the full online evaluation in the real environment. Yet, large amounts of online interactions are often not possible in practice. To overcome this problem, we introduce active offline policy selection --- a novel sequential decision approach that combines logged data with online interaction to identify the best policy. This approach uses OPE estimates to warm start the online evaluation. Then, in order to utilize the limited environment interactions wisely we decide which policy to evaluate next based on a Bayesian optimization method with a kernel function that represents policy similarity. We use multiple benchmarks with a large number of candidate policies to show that the proposed approach improves upon state-of-the-art OPE estimates and pure online policy evaluation

Author Information

Ksenia Konyushova (DeepMind)
Yutian Chen (DeepMind)
Thomas Paine (DeepMind)
CAGLAR Gulcehre (Deepmind)
Cosmin Paduraru (DeepMind)
Daniel J Mankowitz (Technion)
Misha Denil (DeepMind)
Nando de Freitas (UBC)

More from the Same Authors

  • 2021 : StarCraft II Unplugged: Large Scale Offline Reinforcement Learning »
    Michael Mathieu · Sherjil Ozair · Srivatsan Srinivasan · CAGLAR Gulcehre · Shangtong Zhang · Ray Jiang · Tom Paine · Konrad Żołna · Julian Schrittwieser · David Choi · Petko I Georgiev · Daniel Toyama · Roman Ring · Igor Babuschkin · Timo Ewalds · sergomezcol · Aaron van den Oord · Wojciech Czarnecki · Nando de Freitas · Oriol Vinyals
  • 2021 : Introducing Symmetries to Black Box Meta Reinforcement Learning »
    Louis Kirsch · Sebastian Flennerhag · Hado van Hasselt · Abe Friesen · Junhyuk Oh · Yutian Chen
  • 2021 : Introducing Symmetries to Black Box Meta Reinforcement Learning »
    Louis Kirsch · Sebastian Flennerhag · Hado van Hasselt · Abe Friesen · Junhyuk Oh · Yutian Chen
  • 2020 Poster: Modular Meta-Learning with Shrinkage »
    Yutian Chen · Abe Friesen · Feryal Behbahani · Arnaud Doucet · David Budden · Matthew Hoffman · Nando de Freitas
  • 2020 Spotlight: Modular Meta-Learning with Shrinkage »
    Yutian Chen · Abe Friesen · Feryal Behbahani · Arnaud Doucet · David Budden · Matthew Hoffman · Nando de Freitas
  • 2020 Poster: RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning »
    CAGLAR Gulcehre · Ziyu Wang · Alexander Novikov · Thomas Paine · Sergio Gómez · Konrad Zolna · Rishabh Agarwal · Josh Merel · Daniel Mankowitz · Cosmin Paduraru · Gabriel Dulac-Arnold · Jerry Li · Mohammad Norouzi · Matthew Hoffman · Nicolas Heess · Nando de Freitas
  • 2019 : Poster Session »
    Matthia Sabatelli · Adam Stooke · Amir Abdi · Paulo Rauber · Leonard Adolphs · Ian Osband · Hardik Meisheri · Karol Kurach · Johannes Ackermann · Matt Benatan · GUO ZHANG · Chen Tessler · Dinghan Shen · Mikayel Samvelyan · Riashat Islam · Murtaza Dalal · Luke Harries · Andrey Kurenkov · Konrad Żołna · Sudeep Dasari · Kristian Hartikainen · Ofir Nachum · Kimin Lee · Markus Holzleitner · Vu Nguyen · Francis Song · Christopher Grimm · Leno Silva · Yuping Luo · Yifan Wu · Alex Lee · Thomas Paine · Wei-Yang Qu · Daniel Graves · Yannis Flet-Berliac · Yunhao Tang · Suraj Nair · Matthew Hausknecht · Akhil Bagaria · Simon Schmitt · Bowen Baker · Paavo Parmas · Benjamin Eysenbach · Lisa Lee · Siyu Lin · Daniel Seita · Abhishek Gupta · Riley Simmons-Edler · Yijie Guo · Kevin Corder · Vikash Kumar · Scott Fujimoto · Adam Lerer · Ignasi Clavera Gilaberte · Nick Rhinehart · Ashvin Nair · Ge Yang · Lingxiao Wang · Sungryull Sohn · JFernando Hernandez-Garcia · Xian Yeow Lee · Rupesh Srivastava · Khimya Khetarpal · Chenjun Xiao · Luckeciano Carvalho Melo · Rishabh Agarwal · Tianhe (Kevin) Yu · Glen Berseth · Devendra Singh Chaplot · Jie Tang · Anirudh Srinivasan · Tharun Medini · Aaron Havens · Misha Laskin · Asier Mujika · Rohan Saphal · Joe Marino · Alex Ray · Joshua Achiam · Ajay Mandlekar · Zhuang Liu · Danijar Hafner · Zhiwen Tang · Ted Xiao · Michael Walton · Jeff Druce · Ferran Alet · Zhang-Wei Hong · Stephanie Chan · Anusha Nagabandi · Hao Liu · Hao Sun · Ge Liu · Dinesh Jayaraman · JD Co-Reyes · Sophia Sanborn
  • 2019 Workshop: Science meets Engineering of Deep Learning »
    Levent Sagun · CAGLAR Gulcehre · Adriana Romero Soriano · Negar Rostamzadeh · Nando de Freitas
  • 2018 : Poster Session 1 + Coffee »
    Tom Van de Wiele · Rui Zhao · JFernando Hernandez-Garcia · Fabio Pardo · Xian Yeow Lee · Xiaolin Andy Li · Marcin Andrychowicz · Jie Tang · Suraj Nair · Juhyeon Lee · Cédric Colas · Ali Eslami · Yen-Chen Wu · Stephen McAleer · Ryan Julian · Yang Xue · Matthia Sabatelli · Pranav Shyam · Alexandros Kalousis · Giovanni Montana · Emanuele Pesce · Felix Leibfried · Zhanpeng He · Chunxiao Liu · Yanjun Li · Yoshihide Sawada · Alexander Pashevich · Tejas Kulkarni · Keiran Paster · Luca Rigazio · Quan Vuong · Hyunggon Park · Minhae Kwon · Rivindu Weerasekera · Shamane Siriwardhanaa · Rui Wang · Ozsel Kilinc · Keith Ross · Yizhou Wang · Simon Schmitt · Thomas Anthony · Evan Cater · Forest Agostinelli · Tegg Sung · Shirou Maruyama · Alex Shmakov · Devin Schwab · Mohammad Firouzi · Glen Berseth · Denis Osipychev · Jesse Farebrother · Jianlan Luo · William Agnew · Peter Vrancx · Jonathan Heek · Catalin Ionescu Ionescu · Haiyan Yin · Megumi Miyashita · Nathan Jay · Noga H. Rotman · Sam Leroux · Shaileshh Bojja Venkatakrishnan · Henri Schmidt · Jack Terwilliger · Ishan Durugkar · Jonathan Sauder · David Kas · Arash Tavakoli · Alain-Sam Cohen · Philip Bontrager · Adam Lerer · Thomas Paine · Ahmed Khalifa · Rubén Rodriguez · Avi Singh · Yiming Zhang
  • 2018 Poster: Playing hard exploration games by watching YouTube »
    Yusuf Aytar · Tobias Pfaff · David Budden · Thomas Paine · Ziyu Wang · Nando de Freitas
  • 2018 Poster: Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning »
    Tom Zahavy · Matan Haroush · Nadav Merlis · Daniel J Mankowitz · Shie Mannor
  • 2018 Spotlight: Playing hard exploration games by watching YouTube »
    Yusuf Aytar · Tobias Pfaff · David Budden · Thomas Paine · Ziyu Wang · Nando de Freitas
  • 2017 : Invited talk: Learning to learn without gradient descent by gradient descent. »
    Yutian Chen
  • 2017 Poster: Shallow Updates for Deep Reinforcement Learning »
    Nir Levine · Tom Zahavy · Daniel J Mankowitz · Aviv Tamar · Shie Mannor
  • 2017 Poster: Plan, Attend, Generate: Planning for Sequence-to-Sequence Models »
    CAGLAR Gulcehre · Francis Dutil · Adam Trischler · Yoshua Bengio
  • 2016 Poster: Learning to learn by gradient descent by gradient descent »
    Marcin Andrychowicz · Misha Denil · Sergio Gómez · Matthew Hoffman · David Pfau · Tom Schaul · Nando de Freitas
  • 2016 Poster: Adaptive Skills Adaptive Partitions (ASAP) »
    Daniel J Mankowitz · Timothy A Mann · Shie Mannor
  • 2015 : Nando de Freitas »
    Nando de Freitas
  • 2013 Poster: Predicting Parameters in Deep Learning »
    Misha Denil · Babak Shakibi · Laurent Dinh · Marc'Aurelio Ranzato · Nando de Freitas