Timezone: »

Adaptive Batch Size for Safe Policy Gradients
Matteo Papini · Matteo Pirotta · Marcello Restelli

Wed Dec 06 06:30 PM -- 10:30 PM (PST) @ Pacific Ballroom #13

Policy gradient methods are among the best Reinforcement Learning (RL) techniques to solve complex control problems. In real-world RL applications, it is common to have a good initial policy whose performance needs to be improved and it may not be acceptable to try bad policies during the learning process. Although several methods for choosing the step size exist, research paid less attention to determine the batch size, that is the number of samples used to estimate the gradient direction for each update of the policy parameters. In this paper, we propose a set of methods to jointly optimize the step and the batch sizes that guarantee (with high probability) to improve the policy performance after each update. Besides providing theoretical guarantees, we show numerical simulations to analyse the behaviour of our methods.

Author Information

Matteo Papini (Politecnico di Milano)

Matteo Papini was born in Sondrio, Italy, on 5th July 1993. In 2015 he obtained the Bachelor Degree in Ingegneria Informatica (Computer Engineering) cum laude at Politecnico di Milano. In 2017 he obtained the Master Degree in Computer Science and Engineering - Ingegneria Informatica cum laude at Politecnico di Milano. From November 2017 he is a Ph.D. student at Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB) at Politecnico di Milano. His research interests include artificial intelligence, robotics, and machine learning, with a focus on reinforcement learning.

Matteo Pirotta (Facebook AI Research)
Marcello Restelli (Politecnico di Milano)

More from the Same Authors