Timezone: »
A long term goal of Interactive Reinforcement Learning is to incorporate non-expert human feedback to solve complex tasks. State-of-the-art methods have approached this problem by mapping human information to reward and value signals to indicate preferences and then iterating over them to compute the necessary control policy. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct labels on the policy. We compare Advise to state-of-the-art approaches and highlight scenarios where it outperforms them and importantly is robust to infrequent and inconsistent human feedback.
Author Information
Shane Griffith (Georgia Tech)
Kaushik Subramanian (Cogitai Inc.)
Jonathan Scholz (Georgia Tech)
Charles Isbell (Georgia Tech)

Dr. Charles Isbell received his bachelor's in Information and Computer Science from Georgia Tech, and his MS and PhD at MIT's AI Lab. Upon graduation, he worked at AT&T Labs/Research until 2002, when he returned to Georgia Tech to join the faculty as an Assistant Professor. He has served many roles since returning and is now The John P. Imlay Jr. Dean of the College of Computing. Charles’s research interests are varied but the unifying theme of his work has been using machine learning to build autonomous agents who engage directly with humans. His work has been featured in the popular press, congressional testimony, and in several technical collections. In parallel, Charles has also pursued reform in computing education. He was a chief architect of Threads, Georgia Tech’s structuring principle for computing curricula. Charles was also an architect for Georgia Tech’s First-of-its’s-kind MOOC-supported MS in Computer Science. Both efforts have received international attention, and been presented in the academic and popular press. In all his roles, he has continued to focus on issues of broadening participation in computing, and is the founding Executive Director for the Constellations Center for Equity in Computing. He is an AAAI Fellow and a Fellow of the ACM. Appropriately, his citation for ACM Fellow reads “for contributions to interactive machine learning; and for contributions to increasing access and diversity in computing”.
Andrea L Thomaz (Georgia Tech)
More from the Same Authors
-
2020 Invited Talk: You Can’t Escape Hyperparameters and Latent Variables: Machine Learning as a Software Engineering Enterprise »
Charles Isbell -
2017 Poster: State Aware Imitation Learning »
Yannick Schroecker · Charles Isbell -
2016 Workshop: The Future of Interactive Machine Learning »
Kory Mathewson @korymath · Kaushik Subramanian · Mark Ho · Robert Loftin · Joseph L Austerweil · Anna Harutyunyan · Doina Precup · Layla El Asri · Matthew Gombolay · Jerry Zhu · Sonia Chernova · Charles Isbell · Patrick M Pilarski · Weng-Keen Wong · Manuela Veloso · Julie A Shah · Matthew Taylor · Brenna Argall · Michael Littman -
2013 Poster: Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs »
Liam MacDermed · Charles Isbell -
2009 Poster: Solving Stochastic Games »
Liam MacDermed · Charles Isbell -
2008 Poster: QUIC-SVD: Fast SVD Using Cosine Trees »
Michael Holmes · Alexander Gray · Charles Isbell -
2007 Poster: Multi-Stage Monte Carlo Approximation for Fast Generalized Data Summations »
Michael Holmes · Alexander Gray · Charles Isbell