Timezone: »
Reinforcement learning (RL) methods based on direct policy search (DPS) have been actively discussed to achieve an efficient approach to complicated Markov decision processes (MDPs). Although they have brought much progress in practical applications of RL, there still remains an unsolved problem in DPS related to model selection for the policy. In this paper, we propose a novel DPS method, {\it weighted likelihood policy search (WLPS)}, where a policy is efficiently learned through the weighted likelihood estimation. WLPS naturally connects DPS to the statistical inference problem and thus various sophisticated techniques in statistics can be applied to DPS problems directly. Hence, by following the idea of the {\it information criterion}, we develop a new measurement for model comparison in DPS based on the weighted log-likelihood.
Author Information
Tsuyoshi Ueno (Japan Science and Technology)
Yoshinobu Kawahara (Osaka University / RIKEN)
Kohei Hayashi (Preferred Networks)
Takashi Washio (Osaka University)
http://www.ar.sanken.osaka-u.ac.jp/~washio/washpreg.html
More from the Same Authors
-
2019 Poster: Einconv: Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks »
Kohei Hayashi · Taiki Yamaguchi · Yohei Sugawara · Shin-ichi Maeda -
2018 Poster: Metric on Nonlinear Dynamical Systems with Perron-Frobenius Operators »
Isao Ishikawa · Keisuke Fujii · Masahiro Ikeda · Yuka Hashimoto · Yoshinobu Kawahara -
2017 Poster: Fitting Low-Rank Tensors in Constant Time »
Kohei Hayashi · Yuichi Yoshida -
2017 Spotlight: Fitting Low-Rank Tensors in Constant Time »
Kohei Hayashi · Yuichi Yoshida -
2017 Poster: Learning Koopman Invariant Subspaces for Dynamic Mode Decomposition »
Naoya Takeishi · Yoshinobu Kawahara · Takehisa Yairi -
2017 Poster: On Tensor Train Rank Minimization : Statistical Efficiency and Scalable Algorithm »
Masaaki Imaizumi · Takanori Maehara · Kohei Hayashi -
2016 Poster: Minimizing Quadratic Functions in Constant Time »
Kohei Hayashi · Yuichi Yoshida -
2016 Poster: Dynamic Mode Decomposition with Reproducing Kernels for Koopman Spectral Analysis »
Yoshinobu Kawahara -
2013 Poster: Factorized Asymptotic Bayesian Inference for Latent Feature Models »
Kohei Hayashi · Ryohei Fujimaki -
2011 Poster: Prismatic Algorithm for Discrete D.C. Programming Problem »
Yoshinobu Kawahara · Takashi Washio -
2011 Poster: Statistical Performance of Convex Tensor Decomposition »
Ryota Tomioka · Taiji Suzuki · Kohei Hayashi · Hisashi Kashima -
2010 Spotlight: Minimum Average Cost Clustering »
Kiyohito Nagano · Yoshinobu Kawahara · Satoru Iwata -
2010 Poster: Minimum Average Cost Clustering »
Kiyohito Nagano · Yoshinobu Kawahara · Satoru Iwata -
2009 Poster: Submodularity Cuts and Applications »
Yoshinobu Kawahara · Kiyohito Nagano · Koji Tsuda · Jeffrey A Bilmes -
2009 Spotlight: Submodularity Cuts and Applications »
Yoshinobu Kawahara · Kiyohito Nagano · Koji Tsuda · Jeffrey A Bilmes -
2006 Poster: A Kernel Subspace Method by Stochastic Realization for Learning Nonlinear Dynamical Systems »
Yoshinobu Kawahara · Takehisa Yairi · Kazuo Machida