Timezone: »

Deep variational reinforcement learning by optimizing Hamiltonian equation
Zeliang Zhang · Xiao-Yang Liu
Deep variational reinforcement learning by optimizing Hamiltonian equation is a novel training method in reinforcement learning. Liu \cite{liu2020vrl} proposed to maximize the Hamiltonian equation to obtain the policy network. In this poster, we apply the massively parallel simulation to sample trajectories (collecting information of the reward tensor) and train the deep policy network by maximizing a partial Hamiltonian equation. On the FrozenLake $8\times8$ and GridWorld $10\times10$ examples, we verify the theory in \cite{liu2020vrl} by showing that deep Hamiltonian network (DHN) for variational reinforcement learning is more stable and efficient than DQN \cite{mnih2013playing}. Our codes are available at:\href{https://github.com/AI4Finance-Foundation/Quantum-Tensor-Networks-for-Variational-Reinforcement-Learning-NeurIPS-2020}.