Timezone: »

Structural Credit Assignment in Neural Networks using Reinforcement Learning
Dhawal Gupta · Gabor Mihucz · Matthew Schlegel · James Kostas · Philip Thomas · Martha White

Tue Dec 07 08:30 AM -- 10:00 AM (PST) @

Structural credit assignment in neural networks is a long-standing problem, with a variety of alternatives to backpropagation proposed to allow for local training of nodes. One of the early strategies was to treat each node as an agent and use a reinforcement learning method called REINFORCE to update each node locally with only a global reward signal. In this work, we revisit this approach and investigate if we can leverage other reinforcement learning approaches to improve learning. We first formalize training a neural network as a finite-horizon reinforcement learning problem and discuss how this facilitates using ideas from reinforcement learning like off-policy learning. We show that the standard on-policy REINFORCE approach, even with a variety of variance reduction approaches, learns suboptimal solutions. We introduce an off-policy approach, to facilitate reasoning about the greedy action for other agents and help overcome stochasticity in other agents. We conclude by showing that these networks of agents can be more robust to correlated samples when learning online.

Author Information

Dhawal Gupta (University of Alberta)
Gabor Mihucz (University of Alberta)
Matthew Schlegel (University of Alberta)

An AI and coffee enthusiast with research experience in RL and ML. Currently pursuing a PhD at the University of Alberta! Excited about off-policy policy evaluation, general value functions, understanding the behavior of artificial neural networks, and cognitive science (specifically cognitive neuroscience).

James Kostas (University of Massachusetts, Amherst)
Philip Thomas (University of Massachusetts Amherst)
Martha White

More from the Same Authors