Timezone: »
Poster
Cold-Start Reinforcement Learning with Softmax Policy Gradient
Nan Ding · Radu Soricut
Policy-gradient approaches to reinforcement learning have two common and undesirable overhead procedures, namely warm-start training and sample variance reduction. In this paper, we describe a reinforcement learning method based on a softmax value function that requires neither of these procedures. Our method combines the advantages of policy-gradient methods with the efficiency and simplicity of maximum-likelihood approaches. We apply this new cold-start reinforcement learning method in training sequence generation models for structured output prediction problems. Empirical evidence validates this method on automatic summarization and image captioning tasks.
Author Information
Nan Ding (Google)
Radu Soricut (Google)
More from the Same Authors
-
2021 Poster: Bridging the Gap Between Practice and PAC-Bayes Theory in Few-Shot Meta-Learning »
Nan Ding · Xi Chen · Tomer Levinboim · Sebastian Goodman · Radu Soricut -
2016 Poster: Stochastic Gradient MCMC with Stale Gradients »
Changyou Chen · Nan Ding · Chunyuan Li · Yizhe Zhang · Lawrence Carin -
2015 Poster: Embedding Inference for Structured Multilabel Prediction »
Farzaneh Mirzazadeh · Siamak Ravanbakhsh · Nan Ding · Dale Schuurmans -
2014 Poster: Bayesian Sampling Using Stochastic Gradient Thermostats »
Nan Ding · Youhan Fang · Ryan Babbush · Changyou Chen · Robert D Skeel · Hartmut Neven -
2011 Poster: t-divergence Based Approximate Inference »
Nan Ding · S.V.N. Vishwanathan · Yuan Qi -
2010 Poster: t-logistic regression »
Nan Ding · S.V.N. Vishwanathan