Timezone: »
We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic policies in contextual bandits with continuous action spaces. Our work is motivated by practical scenarios where the target policy needs to be deterministic due to domain requirements, such as prescription of treatment dosage and duration in medicine. Although importance sampling (IS) provides a basic principle for OPE, it is ill-posed for the deterministic target policy with continuous actions. Our main idea is to relax the target policy and pose the problem as kernel-based estimation, where we learn the kernel metric in order to minimize the overall mean squared error (MSE). We present an analytic solution for the optimal metric, based on the analysis of bias and variance. Whereas prior work has been limited to scalar action spaces or kernel bandwidth selection, our work takes a step further being capable of vector action spaces and metric optimization. We show that our estimator is consistent, and significantly reduces the MSE compared to baseline OPE methods through experiments on various domains.
Author Information
Haanvid Lee (KAIST)
Jongmin Lee (UC Berkeley)
Yunseon Choi (Korea Advanced Institute of Science & Technology)
Wonseok Jeon (Qualcomm AI Research)
Byung-Jun Lee (KAIST)
Yung-Kyun Noh (Hanyang University / Korea Institute for Advanced Study)
Kee-Eung Kim (KAIST)
More from the Same Authors
-
2022 : Neural DAG Scheduling via One-Shot Priority Sampling »
Wonseok Jeon · Mukul Gagrani · Burak Bartan · Weiliang Zeng · Harris Teague · Piero Zappi · Christopher Lott -
2023 Poster: SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations »
Youngsoo Jang · Geon-Hyeong Kim · Jongmin Lee · Sungryull Sohn · Byoungjip Kim · Honglak Lee · Moontae Lee -
2023 Poster: Regularized Behavior Cloning for Blocking the Leakage of Past Action Information »
Seokin Seo · HyeongJoo Hwang · Hongseok Yang · Kee-Eung Kim -
2023 Poster: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation »
Daiki E Matsunaga · Jongmin Lee · Jaeseok Yoon · Stefanos Leonardos · Pieter Abbeel · Kee-Eung Kim -
2023 Poster: Variational Weighting for Kernel Density Ratios »
Sangwoong Yoon · Frank Park · Gunsu YUN · Iljung Kim · Yung-Kyun Noh -
2023 Poster: Adapt to Adapt: A Tempo-control Framework for Non-stationary Reinforcement Learning »
Hyunin Lee · Yuhao Ding · Jongmin Lee · Ming Jin · Javad Lavaei · Somayeh Sojoudi -
2023 Poster: Energy-Based Models for Anomaly Detection: A Manifold Diffusion Recovery Approach »
Sangwoong Yoon · Young-Uk Jin · Yung-Kyun Noh · Frank Park -
2022 Poster: LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation »
Geon-Hyeong Kim · Jongmin Lee · Youngsoo Jang · Hongseok Yang · Kee-Eung Kim -
2022 Poster: Neural Topological Ordering for Computation Graphs »
Mukul Gagrani · Corrado Rainone · Yang Yang · Harris Teague · Wonseok Jeon · Roberto Bondesan · Herke van Hoof · Christopher Lott · Weiliang Zeng · Piero Zappi -
2022 Poster: A Reparametrization-Invariant Sharpness Measure Based on Information Geometry »
Cheongjae Jang · Sungyoon Lee · Frank Park · Yung-Kyun Noh -
2021 Poster: Multi-View Representation Learning via Total Correlation Objective »
HyeongJoo Hwang · Geon-Hyeong Kim · Seunghoon Hong · Kee-Eung Kim -
2020 Poster: Variational Interaction Information Maximization for Cross-domain Disentanglement »
HyeongJoo Hwang · Geon-Hyeong Kim · Seunghoon Hong · Kee-Eung Kim -
2020 Poster: Reinforcement Learning for Control with Multiple Frequencies »
Jongmin Lee · Byung-Jun Lee · Kee-Eung Kim -
2019 : Poster Session »
Gergely Flamich · Shashanka Ubaru · Charles Zheng · Josip Djolonga · Kristoffer Wickstrøm · Diego Granziol · Konstantinos Pitas · Jun Li · Robert Williamson · Sangwoong Yoon · Kwot Sin Lee · Julian Zilly · Linda Petrini · Ian Fischer · Zhe Dong · Alexander Alemi · Bao-Ngoc Nguyen · Rob Brekelmans · Tailin Wu · Aditya Mahajan · Alexander Li · Kirankumar Shiragur · Yair Carmon · Linara Adilova · SHIYU LIU · Bang An · Sanjeeb Dash · Oktay Gunluk · Arya Mazumdar · Mehul Motani · Julia Rosenzweig · Michael Kamp · Marton Havasi · Leighton P Barnes · Zhengqing Zhou · Yi Hao · Dylan Foster · Yuval Benjamini · Nati Srebro · Michael Tschannen · Paul Rubenstein · Sylvain Gelly · John Duchi · Aaron Sidford · Robin Ru · Stefan Zohren · Murtaza Dalal · Michael A Osborne · Stephen J Roberts · Moses Charikar · Jayakumar Subramanian · Xiaodi Fan · Max Schwarzer · Nicholas Roberts · Simon Lacoste-Julien · Vinay Prabhu · Aram Galstyan · Greg Ver Steeg · Lalitha Sankar · Yung-Kyun Noh · Gautam Dasarathy · Frank Park · Ngai-Man (Man) Cheung · Ngoc-Trung Tran · Linxiao Yang · Ben Poole · Andrea Censi · Tristan Sylvain · R Devon Hjelm · Bangjie Liu · Jose Gallego-Posada · Tyler Sypherd · Kai Yang · Jan Nikolas Morshuis -
2018 Poster: Monte-Carlo Tree Search for Constrained POMDPs »
Jongmin Lee · Geon-Hyeong Kim · Pascal Poupart · Kee-Eung Kim -
2017 : Poster Session (encompasses coffee break) »
Beidi Chen · Borja Balle · Daniel Lee · iuri frosio · Jitendra Malik · Jan Kautz · Ke Li · Masashi Sugiyama · Miguel A. Carreira-Perpinan · Ramin Raziperchikolaei · Theja Tulabandhula · Yung-Kyun Noh · Adams Wei Yu -
2017 Poster: Generative Local Metric Learning for Kernel Regression »
Yung-Kyun Noh · Masashi Sugiyama · Kee-Eung Kim · Frank Park · Daniel Lee -
2012 Poster: Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification »
Yung-Kyun Noh · Frank Park · Daniel Lee -
2010 Poster: Generative Local Metric Learning for Nearest Neighbor Classification »
Yung-Kyun Noh · Byoung-Tak Zhang · Daniel Lee