Timezone: »
Poster
Policy Gradient Coagent Networks
Philip Thomas
We present a novel class of actor-critic algorithms for actors consisting of sets of interacting modules. We present, analyze theoretically, and empirically evaluate an update rule for each module, which requires only local information: the module's input, output, and the TD error broadcast by a critic. Such updates are necessary when computation of compatible features becomes prohibitively difficult and are also desirable to increase the biological plausibility of reinforcement learning methods.
Author Information
Philip Thomas (University of Massachusetts Amherst)
More from the Same Authors
-
2022 : Optimization using Parallel Gradient Evaluations on Multiple Parameters »
Yash Chandak · Shiv Shankar · Venkata Gandikota · Philip Thomas · Arya Mazumdar -
2022 Poster: Off-Policy Evaluation for Action-Dependent Non-stationary Environments »
Yash Chandak · Shiv Shankar · Nathaniel Bastian · Bruno da Silva · Emma Brunskill · Philip Thomas -
2021 : Q&A for Philip Thomas »
Philip Thomas -
2021 : Advances in (High-Confidence) Off-Policy Evaluation »
Philip Thomas -
2021 : Invited Speaker Panel »
Sham Kakade · Minmin Chen · Philip Thomas · Angela Schoellig · Barbara Engelhardt · Doina Precup · George Tucker -
2021 Poster: SOPE: Spectrum of Off-Policy Estimators »
Christina Yuan · Yash Chandak · Stephen Giguere · Philip Thomas · Scott Niekum -
2021 Poster: Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs »
harsh satija · Philip Thomas · Joelle Pineau · Romain Laroche -
2021 Poster: Universal Off-Policy Evaluation »
Yash Chandak · Scott Niekum · Bruno da Silva · Erik Learned-Miller · Emma Brunskill · Philip Thomas -
2021 Poster: Structural Credit Assignment in Neural Networks using Reinforcement Learning »
Dhawal Gupta · Gabor Mihucz · Matthew Schlegel · James Kostas · Philip Thomas · Martha White -
2020 Poster: Towards Safe Policy Improvement for Non-Stationary MDPs »
Yash Chandak · Scott Jordan · Georgios Theocharous · Martha White · Philip Thomas -
2020 Spotlight: Towards Safe Policy Improvement for Non-Stationary MDPs »
Yash Chandak · Scott Jordan · Georgios Theocharous · Martha White · Philip Thomas -
2020 Poster: Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms »
Pinar Ozisik · Philip Thomas -
2019 Poster: Offline Contextual Bandits with High Probability Fairness Guarantees »
Blossom Metevier · Stephen Giguere · Sarah Brockman · Ari Kobren · Yuriy Brun · Emma Brunskill · Philip Thomas -
2019 Poster: A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning »
Francisco Garcia · Philip Thomas -
2015 Poster: Policy Evaluation Using the Ω-Return »
Philip Thomas · Scott Niekum · Georgios Theocharous · George Konidaris -
2013 Poster: Projected Natural Actor-Critic »
Philip Thomas · William C Dabney · Stephen Giguere · Sridhar Mahadevan -
2011 Poster: TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning »
George Konidaris · Scott Niekum · Philip Thomas