Timezone: »
Today machine learning is largely about pattern discovery and function approximation. But as computing devices that interact with us in natural language become ubiquitous (e.g., Siri, Alexa, Google Now), and as computer perceptual abilities become more accurate, they open an exciting possibility of enabling end-users to teach machines similar to the way in which humans teach one another. Natural language conversation, gesturing, demonstrating, teleoperating and other modes of communication offer a new paradigm for machine learning through instruction from humans. This builds on several existing machine learning paradigms (e.g., active learning, supervised learning, reinforcement learning), but also brings a new set of advantages and research challenges that lie at the intersection of several fields including machine learning, natural language understanding, computer perception, and HCI.
The aim of this workshop is to engage researchers from these diverse fields to explore fundamental research questions in this new area, such as:
How do people interact with machines when teaching them new learning tasks and knowledge?
What novel machine learning models and algorithms are needed to learn from human instruction?
What are the practical considerations towards building practical systems that can learn from instruction?
Sat 5:30 a.m. - 5:35 a.m.
|
Introduction
(
Welcome
)
|
🔗 |
Sat 5:35 a.m. - 6:00 a.m.
|
Teaching Machines like we Teach People
(
Talk from Organizers
)
|
🔗 |
Sat 6:00 a.m. - 6:30 a.m.
|
Mapping Navigation Instructions to Continuous Control
(
Invited Talk
)
Natural language understanding in grounded interactive scenarios is tightly coupled with the actions the system generates. The action space used determines much of the complexity of the problem and the type of reasoning required. In this talk, I will describe our approach to learning to map instructions and observations to continuous control of a realistic quadcopter drone. This scenario raises new challenging questions including how can we use demonstrations to learn to bridge the gap between the high-level concepts of language and low-level robot controls? And how do we design models that continuously observe, control, and react to a rapidly changing environment? This work uses a new publicly available evaluation benchmark. |
Yoav Artzi 🔗 |
Sat 6:30 a.m. - 7:00 a.m.
|
An Cognitive Architecture Approach to Interactive Task Learning
(
Invited Talk
)
|
John Laird 🔗 |
Sat 7:00 a.m. - 7:15 a.m.
|
Compositional Imitation Learning: Explaining and executing one task at a time
(
Contributed Talk
)
We introduce a framework for Compositional Imitation Learning and Execution (CompILE) of hierarchically-structured behavior. CompILE learns reusable, variable-length segments of behavior from demonstration data using a novel unsupervised, fully-differentiable sequence segmentation module. These learned behaviors can then be re-composed and executed to perform new tasks. At training time, CompILE auto-encodes observed behavior into a sequence of latent codes, each corresponding to a variable-length segment in the input sequence. Once trained, our model generalizes to sequences of longer length and from environment instances not seen during training. We evaluate our model in a challenging 2D multi-task environment and show that CompILE can find correct task boundaries and event encodings in an unsupervised manner without requiring annotated demonstration data. We demonstrate that latent codes and associated behavior policies discovered by CompILE can be used by a hierarchical agent, where the high-level policy selects actions in the latent code space, and the low-level, task-specific policies are simply the learned decoders. We found that our agent could learn given only sparse rewards, where agents without task-specific policies struggle. |
Thomas Kipf 🔗 |
Sat 7:15 a.m. - 7:30 a.m.
|
Learning to Learn from Imperfect Demonstrations
(
Contributed Talk
)
In the standard formulation of imitation learning, the agent starts from scratch without the means to take advantage of an informative prior. As a result, the expert's demonstrations have to either be optimal, or contain a known mode of sub-optimality that could be modeled. In this work, we consider instead the problem of imitation learning from imperfect demonstrations where a small number of demonstrations containing unstructured imperfections is available. In particular, these demonstrations contain large systematic biases, or fails to complete the task in unspecified ways. Our Learning to Learn From Imperfect Demonstrations (LID) framework casts such problem as a meta-learning problem, where the agent meta-learns a robust imitation algorithm that is able to infer the correct policy despite of these imperfections, by taking advantage of an informative prior. We demonstrate the robustness of this algorithm over 2D reaching tasks, multitask door opening and picking tasks with a simulated robot arm, where the demonstration merely gestures for the intended target. Despite not seeing a demonstration that completes the task, the agent is able to draw lessons from its prior experience--correctly inferring a policy that accomplishes the task where the demonstration fails to. |
Ge Yang · Chelsea Finn 🔗 |
Sat 8:00 a.m. - 8:30 a.m.
|
Natural Language Supervision
(
Invited Talk
)
|
Percy Liang 🔗 |
Sat 8:30 a.m. - 9:00 a.m.
|
Control Algorithms for Imitation Learning from Observation
(
Invited Talk
)
|
Peter Stone 🔗 |
Sat 9:00 a.m. - 9:15 a.m.
|
From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following
(
Contributed talk
)
Reinforcement learning is a promising framework for solving control problems, but its use in practical situations is hampered by the fact that reward functions are often difficult to engineer. Specifying goals and tasks for autonomous machines, such as robots, is a significant challenge: conventionally, reward functions and goal states have been used to communicate objectives. But people can communicate objectives to each other simply by describing or demonstrating them. How can we build learning algorithms that will allow us to tell machines what we want them to do? In this work, we investigate the problem of grounding language commands as reward functions using inverse reinforcement learning, and argue that language-conditioned rewards are more transferable than language-conditioned policies to new environments. We propose language-conditioned reward learning (LC-RL), which grounds language commands as a reward function represented by a deep neural network. We demonstrate that our model learns rewards that transfer to novel tasks and environments on realistic, high-dimensional visual environments with natural language commands, whereas directly learning a language-conditioned policy leads to poor performance. |
Justin Fu 🔗 |
Sat 9:15 a.m. - 9:30 a.m.
|
Teaching Multiple Tasks to an RL Agent using LTL
(
Contributed Talk
)
This paper examines the problem of how to teach multiple tasks to a Reinforcement Learning (RL) agent. To this end, we use Linear Temporal Logic (LTL) as a language for specifying multiple tasks in a manner that supports the composition of learned skills. We also propose a novel algorithm that exploits LTL progression and off-policy RL to speed up learning without compromising convergence guarantees, and show that our method outperforms the state-of-the-art. |
Rodrigo Toro Icarte · Sheila McIlraith 🔗 |
Sat 10:30 a.m. - 11:00 a.m.
|
Meta-Learning to Follow Instructions, Examples, and Demonstrations
(
Invited Talk
)
|
Sergey Levine 🔗 |
Sat 11:00 a.m. - 11:30 a.m.
|
Learning to Understand Natural Language Instructions through Human-Robot Dialog
(
Invited Talk
)
|
Raymond Mooney 🔗 |
Sat 11:30 a.m. - 11:45 a.m.
|
The Implicit Preference Information in an Initial State
(
Contributed Talk
)
Reinforcement learning (RL) agents optimize only the specified features and are indifferent to anything left out inadvertently. This means that we must not only tell a household robot what to do, but also the much larger space of what not to do. It is easy to forget these preferences, since we are so used to having them satisfied. Our key insight is that when a robot is deployed in an environment that humans act in, the state of the environment is already optimized for what humans want. We can therefore use this implicit information from the state to fill in the blanks. We develop an algorithm based on Maximum Causal Entropy IRL and use it to evaluate the idea in a suite of proof-of-concept environments designed to show its properties. We find that information from the initial state can be used to infer both side effects that should be avoided as well as preferences for how the environment should be organized. |
Rohin Shah 🔗 |
Sat 11:45 a.m. - 12:00 p.m.
|
Modelling User's Theory of AI's Mind in Interactive Intelligent Systems
(
Contributed Talk
)
Many interactive intelligent systems, such as recommendation and information retrieval systems, treat users as a passive data source. Yet, users form mental models of systems and instead of passively providing feedback to the queries of the system, they will strategically plan their actions within the constraints of the mental model to steer the system and achieve their goals faster. We propose to explicitly account for the user's theory of the AI's mind in the user model: the intelligent system has a model of the user having a model of the intelligent system. We study a case where the system is a contextual bandit and the user model is a Markov decision process that plans based on a simpler model of the bandit. Inference in the model can be reduced to probabilistic inverse reinforcement learning, with the nested bandit model defining the transition dynamics, and is implemented using probabilistic programming. Our results show that improved performance is achieved if users can form accurate mental models that the system can capture, implying predictability of the interactive intelligent system is important not only for the user experience but also for the design of the system's statistical models. |
Tomi Peltola 🔗 |
Sat 12:30 p.m. - 1:15 p.m.
|
Poster Session
|
Carl Trimbach · Mennatullah Siam · Rodrigo Toro Icarte · Zhongtian Dai · Sheila McIlraith · Matthew Rahtz · Robert Sheline · Christopher MacLellan · Carolin Lawrence · Stefan Riezler · Dylan Hadfield-Menell · Fang-I Hsiao
|
Sat 1:15 p.m. - 1:30 p.m.
|
Assisted Inverse Reinforcement Learning
(
Contributed Talk
)
We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic question: How could a teacher provide an informative sequence of demonstrations to an IRL agent to speed up the learning process? We prove rigorous convergence guarantees of a new iterative teaching algorithm that adaptively chooses demonstrations based on the learner’s current performance. Extensive experiments with a car driving simulator environment show that the learning progress can be speeded up drastically as compared to an uninformative teacher. |
Adish Singla · Rati Devidze 🔗 |
Sat 1:30 p.m. - 2:00 p.m.
|
Teaching through Dialogue and Games
(
Invited Talk
)
|
Jason E Weston 🔗 |
Sat 2:00 p.m. - 2:45 p.m.
|
Panel Discussion
(
Discussion Panel
)
|
🔗 |
Author Information
Shashank Srivastava (Microsoft Research)
Igor Labutov (Cornell University)
Bishan Yang (Cornell University)
Amos Azaria (Ariel University)
Tom Mitchell (Carnegie Mellon University)
More from the Same Authors
-
2020 : Q & A and Panel Session with Tom Mitchell, Jenn Wortman Vaughan, Sanjoy Dasgupta, and Finale Doshi-Velez »
Tom Mitchell · Jennifer Wortman Vaughan · Sanjoy Dasgupta · Finale Doshi-Velez · Zachary Lipton -
2019 : Tom Mitchell - Understanding Neural Processes: Getting Beyond Where and When, to How »
Tom Mitchell -
2019 : Tom Mitchell »
Tom M Mitchell -
2019 Poster: Learning Data Manipulation for Augmentation and Weighting »
Zhiting Hu · Bowen Tan · Russ Salakhutdinov · Tom Mitchell · Eric Xing -
2019 Poster: Game Design for Eliciting Distinguishable Behavior »
Fan Yang · Liu Leqi · Yifan Wu · Zachary Lipton · Pradeep Ravikumar · Tom M Mitchell · William Cohen -
2018 Poster: Learning Pipelines with Limited Data and Domain Knowledge: A Study in Parsing Physics Problems »
Mrinmaya Sachan · Kumar Avinava Dubey · Tom Mitchell · Dan Roth · Eric Xing -
2017 : Poster Sessions »
Dennis Forster · David I Inouye · Shashank Srivastava · Martine De Cock · Srinagesh Sharma · Mateusz Kozinski · Petr Babkin · maxime he · Zhe Cui · Shivani Rao · Ramesh Raskar · Pradipto Das · Albert Zhao · Ravi Lanka -
2017 : Invited Talk: Learning from Limited Labeled Data (But a Lot of Unlabeled Data) »
Tom Mitchell -
2017 : NELL: Lessons and Future Directions »
Tom Mitchell -
2017 Poster: Estimating Accuracy from Unlabeled Data: A Probabilistic Logic Approach »
Emmanouil Platanios · Hoifung Poon · Tom M Mitchell · Eric Horvitz -
2014 Workshop: 4th Workshop on Automated Knowledge Base Construction (AKBC) »
Sameer Singh · Fabian M Suchanek · Sebastian Riedel · Partha Pratim Talukdar · Kevin Murphy · Christopher Ré · William Cohen · Tom Mitchell · Andrew McCallum · Jason E Weston · Ramanathan Guha · Boyan Onyshkevych · Hoifung Poon · Oren Etzioni · Ari Kobren · Arvind Neelakantan · Peter Clark -
2009 Poster: Zero-shot Learning with Semantic Output Codes »
Mark M Palatucci · Dean Pomerleau · Geoffrey E Hinton · Tom Mitchell -
2008 Workshop: Parallel Implementations of Learning Algorithms: What have you done for me lately? »
Robert Thibadeau · Dan Hammerstrom · David S Touretzky · Tom Mitchell -
2008 Workshop: Parallel Implementations of Learning Algorithms: What have you done for me lately? »
Robert Thibadeau · David S Touretzky · Dan Hammerstrom · Tom Mitchell -
2006 Workshop: New directions on decoding mental states from fMRI data »
John-Dylan Haynes · Tom Mitchell · Francisco Pereira