Timezone: »

 
Poster
Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification
Takumi Tanabe · Rei Sato · Kazuto Fukuchi · Jun Sakuma · Youhei Akimoto

Wed Nov 30 09:00 AM -- 11:00 AM (PST) @ Hall J #402

In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment.However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of policy performance in a real-world environment.In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment.To obtain a policy for the optimization, we propose an off-policy actor-critic approach called the Max-Min Twin Delayed Deep Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min optimization problem using a simultaneous gradient ascent descent approach.Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.

Author Information

Takumi Tanabe (University of Tsukuba)
Rei Sato (LINE Corp. / Univ. of Tsukuba)
Kazuto Fukuchi (University of Tsukuba)
Jun Sakuma (University of Tsukuba / RIKEN)
Youhei Akimoto (University of Tsukuba / RIKEN AIP)

More from the Same Authors

  • 2021 : Unsupervised Causal Binary Concepts Discovery with VAE for Black-box Model Explanation »
    Thien Tran · Kazuto Fukuchi · Youhei Akimoto · Jun Sakuma
  • 2022 : Minimax Optimal Fair Regression under Linear Model »
    Kazuto Fukuchi · Jun Sakuma
  • 2018 : Lunch »
    Hong Yu · Bhanu Pratap Singh Rawat · Arijit Ukil · Waheeda Saib · Jekaterina Novikova · John Hughes · Yuhui Zhang · Rahul V · Mi Jung Kim · Babak Taati · Hariharan Ravishankar · Harry Clifford · Hirofumi Kobayashi · Babak Taati · Keyang Xu · Yen-Chi Cheng · Timothy Cannings · Jayashree Kalpathy-Cramer · Jayashree Kalpathy-Cramer · Parinaz Sobhani · Kimis Perros · Wei-Hung Weng · Yordan Raykov · Lars Lorch · Mengqi Jin · Xue Teng · Michael Ferlaino · Marek Rei · C├ędric Beaulac · Aman Verma · Sebastian Keller · Edmond Cunningham · Luc Evers · Victor Rodriguez · Vipul Satone · Dianbo Liu · Angeline Yasodhara · Geoff Tison · Ligin Solamen · Bryan He · Rahul Ladhania · Yipeng Shi · Md Nafiz Hamid · Pouria Mashouri · Woochan Hwang · Sejin Park · Xu Chen · Rachneet Kaur · Davis Blalock · Holly Wiberg · Parminder Bhatia · Kezi Yu · RUMENG LI · Jun Sakuma · Charles Ding · Aaron Babier · Yong Cai · A Pratap · Luke O'Connor · Allen Nie · Martin Kang · Ian Covert · Xun Wang · Zelun Luo · Serena Yeung · William Boag · Kazuki Tachikawa · Mary Saltz · Owen Lahav · Edward Lee · Eric Teasley · Michael Kamp · Nirmesh Patel · Vishwali Mhasawade · Maxim Samarin · Ryo Uchimido · Farzad Khalvati · Francisco Cruz · Laura Symul · Zaid Nabulsi · Mads Mihailescu · Rosalind Picard