Timezone: »

 
Reward and State Design: Towards Learning to Teach
Alex Lewandowski · Calarina Muslimani · Matthew Taylor · Jun Luo

A reinforcement agent that learns \textit{tabula rasa} will make many missteps on its way to maximize its return. To accelerate learning, we can introduce a teacher agent that learns by observing the student, and acts by tuning the environment. In this paper, we provide a framework for learning to teach reinforcement learning agents by encoding the student's trajectory. We investigate different state representation and reward functions for the teacher. In tabular environments, we conjecture that the greedy policy induced by the learned action-values, not the action-values themselves, is an ideal state representation. We then propose three architectures that encode the student's trajectory to approximate the state representation provided by the greedy policy. Learning the teacher's policy offline, we find that the greedy policy state representation is superior, but that the trajectory based state representation is a close competitor. In addition, we design a new reward function for the teacher that enables the student to convey information about its learning progress. We show that the resultant teacher curriculum increases student learning efficiency compared to training the teacher with a minimal encoded reward function. These findings suggest that a general framework for reinforcement teaching can increase the sample efficiency of reinforcement learning.

Author Information

Alex Lewandowski (University of Alberta)
Calarina Muslimani (University of Alberta)
Matthew Taylor (University of Alberta)
Jun Luo (Huawei Technologies Ltd.)

More from the Same Authors

  • 2022 Poster: Multiagent Q-learning with Sub-Team Coordination »
    Wenhan Huang · Kai Li · Kun Shao · Tianze Zhou · Matthew Taylor · Jun Luo · Dongge Wang · Hangyu Mao · Jianye Hao · Jun Wang · Xiaotie Deng
  • 2022 : Build generally reusable agent-environment interaction models »
    Jun Jin · Hongming Zhang · Jun Luo
  • 2022 Spotlight: Lightning Talks 5A-3 »
    Minting Pan · Xiang Chen · Wenhan Huang · Can Chang · Zhecheng Yuan · Jianzhun Shao · Yushi Cao · Peihao Chen · Ke Xue · Zhengrong Xue · Zhiqiang Lou · Xiangming Zhu · Lei Li · Zhiming Li · Kai Li · Jiacheng Xu · Dongyu Ji · Ni Mu · Kun Shao · Tianpei Yang · Kunyang Lin · Ningyu Zhang · Yunbo Wang · Lei Yuan · Bo Yuan · Hongchang Zhang · Jiajun Wu · Tianze Zhou · Xueqian Wang · Ling Pan · Yuhang Jiang · Xiaokang Yang · Xiaozhuan Liang · Hao Zhang · Weiwen Hu · Miqing Li · YAN ZHENG · Matthew Taylor · Huazhe Xu · Shumin Deng · Chao Qian · YI WU · Shuncheng He · Wenbing Huang · Chuanqi Tan · Zongzhang Zhang · Yang Gao · Jun Luo · Yi Li · Xiangyang Ji · Thomas Li · Mingkui Tan · Fei Huang · Yang Yu · Huazhe Xu · Dongge Wang · Jianye Hao · Chuang Gan · Yang Liu · Luo Si · Hangyu Mao · Huajun Chen · Jianye Hao · Jun Wang · Xiaotie Deng
  • 2022 Spotlight: Multiagent Q-learning with Sub-Team Coordination »
    Wenhan Huang · Kai Li · Kun Shao · Tianze Zhou · Matthew Taylor · Jun Luo · Dongge Wang · Hangyu Mao · Jianye Hao · Jun Wang · Xiaotie Deng
  • 2022 Poster: A Simple Decentralized Cross-Entropy Method »
    Zichen Zhang · Jun Jin · Martin Jagersand · Jun Luo · Dale Schuurmans