Language is one of the most impressive human accomplishments and is believed to be the core to our ability to learn, teach, reason and interact with others. Learning many complex tasks or skills would be significantly more challenging without relying on language to communicate, and language is believed to have a structuring impact on human thought. Written language has also given humans the ability to store information and insights about the world and pass it across generations and continents. Yet, the ability of current state-of-the art reinforcement learning agents to understand natural language is limited.
Practically speaking, the ability to integrate and learn from language, in addition to rewards and demonstrations, has the potential to improve the generalization, scope and sample efficiency of agents. For example, agents that are capable of transferring domain knowledge from textual corpora might be able to much more efficiently explore in a given environment or to perform zero or few shot learning in novel environments. Furthermore, many real-world tasks, including personal assistants and general household robots, require agents to process language by design, whether to enable interaction with humans, or simply use existing interfaces.
To support this field of research, we are interested in fostering the discussion around:
- Methods that can effectively link language to actions and observations in the environment;
- Research into language roles beyond encoding goal states, such as structuring hierarchical policies,
- Communicating domain knowledge or reward shaping;
- Methods that can help identify and incorporate outside textual information about the task, or general-purpose semantics learned from outside corpora;
- Novel environments and benchmarks enabling such research and approaching complexity of real-world problem settings.
The aim of the workshop on Language in Reinforcement Learning (LaReL) is to steer discussion and research of these problems by bringing together researchers from several communities, including reinforcement learning, robotics, natural language processing, computer vision and cognitive psychology.
Fri 6:30 a.m. - 6:40 a.m.
|
Opening remarks
|
🔗 |
Fri 6:40 a.m. - 7:20 a.m.
|
Invited Talk: Dorsa Sadigh
(
Invited Talk
)
SlidesLive Video » |
Dorsa Sadigh · Siddharth Karamcheti 🔗 |
Fri 7:20 a.m. - 8:00 a.m.
|
Invited Talk: Chen Yan
(
Invited Talk
)
SlidesLive Video » |
Chen Yan 🔗 |
Fri 8:00 a.m. - 8:15 a.m.
|
Morning Break + Posters
|
🔗 |
Fri 8:15 a.m. - 8:45 a.m.
|
Morning Poster Session
(
Poster Session
)
|
🔗 |
Fri 8:45 a.m. - 9:00 a.m.
|
Contributed Talk 1: ScriptWorld: A Scripts-based RL Environment
(
Contributed Talk
)
SlidesLive Video » Text-based games provide a framework for developing natural language understanding and commonsense knowledge about the world in reinforcement learning algorithms. Existing text-based environments often rely on fictional situations and characters to create a gaming framework and are far from real-world scenarios. In this paper, we introduce ScriptWorld: A text-based environment for teaching agents about real-world daily chores, imparting commonsense knowledge. To the best of our knowledge, it is the first interactive text-based gaming framework that considers data written by humans (scripts datasets) to create procedural games for daily real-world human activities. We provide gaming environments for 10 daily activities and perform a detailed analysis to capture the richness of the proposed environment. We also test the developed environment using human gameplay experiments and reinforcement learning algorithms as baselines. Our experiments show that the flexibility of the proposed environment makes it a suitable testbed for reinforcement learning algorithms to learn the underlying procedural knowledge in daily human chores. |
🔗 |
Fri 9:00 a.m. - 9:15 a.m.
|
Contributed Talk 2: How to talk so AI will learn: instructions, descriptions, and pragmatics
(
Contributed Talk
)
SlidesLive Video » umans intuitively use language to express our beliefs and desires, but today we lack computational models explaining such abstract language use.To address this challenge, we consider social learning in a linear bandit setting and ask how a human might communicate preferences over behaviors (i.e. the reward function). We study two distinct types of language: instructions, which specify partial policies, and descriptions, which provide information about the reward function. To explain how humans use such language, we suggest they reason about both known present and unknown future states: instructions optimize for the present, while descriptions optimize for the future. We formalize this choice by extending reward design to consider a distribution over states.We then define a pragmatic listener agent that infers the speaker's reward function by reasoning about how the speaker expresses themselves. Simulations suggest that (1) descriptions afford stronger learning than instructions; and (2) maintaining uncertainty over the speaker's pedagogical intent allows for robust reward inference. We hope these insights facilitate a shift from developing agents that obey language to agents that learn from it. |
🔗 |
Fri 9:15 a.m. - 9:55 a.m.
|
Invited Talk: Noah Goodman
(
Invited Talk
)
SlidesLive Video » |
Noah Goodman 🔗 |
Fri 9:55 a.m. - 10:00 a.m.
|
Best paper announcement
(
Best paper annoucement
)
SlidesLive Video » |
🔗 |
Fri 10:00 a.m. - 11:05 a.m.
|
Lunch
|
🔗 |
Fri 11:05 a.m. - 11:45 a.m.
|
Invited Talk: Stephanie Tellex
(
Invited Talk
)
SlidesLive Video » |
Stefanie Tellex 🔗 |
Fri 11:45 a.m. - 12:00 p.m.
|
Contributed talk 3: Collaborating with language models for embodied reasoning
(
Contributed talk
)
SlidesLive Video » Reasoning in a complex and ambiguous embodied environment is a key goal for Reinforcement Learning (RL) agents. While some sophisticated RL agents can successfully solve difficult tasks, they require a large amount of training data and often struggle to generalize to new unseen environments and new tasks. On the other hand, Large Scale Language Models (LSLMs) have exhibited strong reasoning ability and the ability to to adapt to new tasks through in-context learning. However, LSLMs do not inherently have the ability to interrogate or intervene on the environment. In this work, we investigate how to combine these complementary abilities in a single system consisting of three parts: a Planner, an Actor, and a Reporter. The Planner is a pre-trained language model that can issue commands to a simple embodied agent (the Actor), while the Reporter communicates with the Planner to inform its next command. We present a set of tasks that require reasoning, test this system's ability to generalize zero-shot and investigate failure cases, and demonstrate how components of this system can be trained with reinforcement-learning to improve performance. |
🔗 |
Fri 12:00 p.m. - 12:50 p.m.
|
Late-breaking results
(
Contributed talk
)
|
🔗 |
Fri 12:00 p.m. - 12:25 p.m.
|
Late-breaking results 1: Cicero: Combining Language Models and Strategic Reasoning in the Game of Diplomacy
(
Late-breaking results
)
SlidesLive Video » |
🔗 |
Fri 12:25 p.m. - 12:50 p.m.
|
Late Breaking Result 2: VIMA: General Robot Manipulation with Multimodal Prompts
(
Late-breaking results
)
SlidesLive Video » |
🔗 |
Fri 12:50 p.m. - 1:20 p.m.
|
Afternoon Poster Session
(
Poster Session
)
|
🔗 |
Fri 1:20 p.m. - 1:35 p.m.
|
Afternoon Break + Posters
|
🔗 |
Fri 1:35 p.m. - 2:15 p.m.
|
Invited Talk: Igor Mordatch
(
Invited Talk
)
SlidesLive Video » |
Igor Mordatch 🔗 |
Fri 2:15 p.m. - 2:55 p.m.
|
Invited Talk: James McClelland
(
Invited Talk
)
SlidesLive Video » |
James McClelland 🔗 |
Fri 2:55 p.m. - 3:00 p.m.
|
Closing Remarks
|
🔗 |
-
|
Toward Semantic History Compression for Reinforcement Learning
(
Poster
)
link »
Agents interacting under partial observability require access to past observations via a memory mechanism in order to approximate the true state of the environment.Recent work suggests that leveraging language as abstraction provides benefits for creating a representation of past events.History Compression via Language Models (HELM) leverages a pretrained Language Model (LM) for representing the past. It relies on a randomized attention mechanism to translate environment observations to token embeddings.In this work, we show that the representations resulting from this attention mechanism can collapse under certain conditions. This causes blindness of the agent to certain subtleties in the environment. We propose a solution to this problem consisting of two parts. First, we improve upon HELM by substituting the attention mechanism with a feature-wise centering-and-scaling operation. Second, we take a step toward semantic history compression by encoding the observations with a pretrained multimodal model such as CLIP, which further improves performance. With these improvements our model is able to solve the challenging MiniGrid-Memory environment.Surprisingly, however, our experiments suggest that this is not due to the semantic enrichment of the representation presented to the LM but only due to the discriminative power provided by CLIP. |
Fabian Paischer · Thomas Adler · Andreas Radler · Markus Hofmarcher · Sepp Hochreiter 🔗 |
-
|
Towards an Enhanced, Faithful, and Adaptable Web Interaction Environment
(
Poster
)
link »
We identify key areas of improvement for WebShop, an e-commerce shopping environment for training decision making language agents. Specifically, shortcomings in: 1) faithfulness of the reward function to human evaluation, 2) comprehensiveness of its content, and 3) human participation required for generating instructions has hindered WebShop’s promises to be a scalable real-world environment. To solve these issues, we first incorporate greater faithfulness to human evaluation by designing a new reward function to capture lexical similarities and synonyms. Second, we identify customer reviews, similar products, and customer FAQs as missing semantic components that are most helpful to human execution of the task from surveying 75 respondents. Finally, we reformulate the attribute tagging problem as a extractive short-phrase prediction task to enhance scalability. Our V2 reward function closes the gap between the scores of the WebShop’s automated reward function (from 81.5% to 87.7%) and human evaluation (89.9%). Our attribute tagging approach achieves an accuracy of 72.2% with a t5-3b model fine tuned on 2, 000 training data points, showing potential to automate the instruction creation pipeline. |
John Yang · Howard Chen · Karthik Narasimhan 🔗 |
-
|
Understanding Redundancy in Discrete Multi-Agent Communication
(
Poster
)
link »
Through providing agents with the capacity to learn sample-efficient and generalisable communications protocols, we may enable them to more effectively cooperate in real-world tasks. In this paper, we consider this in the context of discrete decentralised multi-agent reinforcement learning to provide insights into the impact of the often overlooked size of the message set. Within a referential game, we find that over-provisioning the message set size leads to improved sample efficiency, but that these policies tend to maintain a high-degree of redundancy, often utilising multiple messages to refer to each label in the dataset. We hypothesise that the additional redundancy within these converged policies may have implications for generalisation and experiment with methodologies to gradually reduce redundancy while maintaining sample-efficiency. To this end, we propose a linearly-scheduled entropy regulariser which encourages an agent to initially maximise the utilisation of the available messages but, as training progresses, it tries to minimise it. Through this mechanism, we achieve a comparable sample efficiency whilst converging to a model with significantly reduced redundancy and that generalises more effectively to previously unseen data. |
Jonathan Thomas · Raul Santos-Rodriguez · Robert Piechocki 🔗 |
-
|
Language-Conditioned Reinforcement Learning to Solve Misunderstandings with Action Corrections
(
Poster
)
link »
Human-to-human conversation is not just talking and listening. It is an incremental process where participants continually establish a common understanding to rule out misunderstandings. Current language understanding methods for intelligent robots do not consider this. There exist numerous approaches considering non-understandings, but they ignore the incremental process of resolving misunderstandings. In this article, we present a first formalization and experimental validation of incremental action-repair for robotic instruction-following based on reinforcement learning. To evaluate our approach, we propose a collection of benchmark environments for action correction in language-conditioned reinforcement learning, utilizing a synthetic instructor to generate language goals and their corresponding corrections. We show that a reinforcement learning agent can successfully learn to understand incremental corrections of misunderstood instructions. |
Frank Röder · Manfred Eppe 🔗 |
-
|
Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems
(
Poster
)
link »
When learning task-oriented dialogue (TOD) agents, one can naturally utilize reinforcement learning (RL) techniques to train conversational strategies to achieve user-specific goals. Existing works on training TOD agents mainly focus on developing advanced RL algorithms, while the mechanical designs of reward functions are not well studied. This paper discusses how we can better learn and utilize reward functions for training TOD agents. Specifically, we propose two generalized objectives for reward function learning inspired by the classical learning to rank losses. Further, to address the high variance issue of policy gradient estimation using REINFORCE, we leverage the gumbel-softmax trick to better estimate the gradient for TOD policies, which significantly improves the training stability for policy learning. With the above techniques, we can outperform the state-of-the-art results on the end-to-end dialogue task on the Multiwoz 2.0 dataset. |
Yihao Feng · Shentao Yang · Shujian Zhang · Jianguo Zhang · Caiming Xiong · Mingyuan Zhou · Huan Wang 🔗 |
-
|
ScriptWorld: A Scripts-based RL Environment
(
Poster
)
link »
Text-based games provide a framework for developing natural language understanding and commonsense knowledge about the world in reinforcement learning algorithms. Existing text-based environments often rely on fictional situations and characters to create a gaming framework and are far from real-world scenarios. In this paper, we introduce ScriptWorld: A text-based environment for teaching agents about real-world daily chores, imparting commonsense knowledge. To the best of our knowledge, it is the first interactive text-based gaming framework that considers data written by humans (scripts datasets) to create procedural games for daily real-world human activities. We provide gaming environments for 10 daily activities and perform a detailed analysis to capture the richness of the proposed environment. We also test the developed environment using human gameplay experiments and reinforcement learning algorithms as baselines. Our experiments show that the flexibility of the proposed environment makes it a suitable testbed for reinforcement learning algorithms to learn the underlying procedural knowledge in daily human chores. |
Abhinav Joshi · areeb ahmad · Umang Pandey · Ashutosh Modi 🔗 |
-
|
$\ell$Gym: Natural Language Visual Reasoning with Reinforcement Learning
(
Poster
)
link »
We present $\ell$Gym, a new benchmark for language-conditioned reinforcement learning in visual environments. $\ell$Gym is based on 2,661 human-written natural language statements grounded in an interactive visual environment, and emphasizing compositionality and semantic diversity. We annotate all statements with Python programs representing their meaning. The programs are executable in an interactive visual environment to enable exact reward computation in every possible world state. Each statement is paired with multiple start states and reward functions to form thousands of distinct Contextual Markov Decision Processes of varying difficulty. We experiment with $\ell$Gym with different models and learning regimes. Our results and analysis show that while existing methods are able to achieve non-trivial performance, $\ell$Gym forms a challenging open problem.
|
Anne Wu · Kianté Brantley · Noriyuki Kojima · Yoav Artzi 🔗 |
-
|
Hierarchical Agents by Combining Language Generation and Semantic Goal Directed RL
(
Poster
)
link »
Learning to solve long horizon temporally extended tasks with reinforcement learning has been a challenge for several years now. We believe that it is important to leverage both the hierarchical structure of complex tasks and to use expert supervision whenever possible to solve such tasks. This work introduces an interpretable hierarchical agent framework by combining sub-goal generation using language and semantic goal directed reinforcement learning. We assume access to certain spatial and haptic predicates and construct a simple and powerful semantic goal space. These semantic goal representations act as an intermediate representation between language and raw states. We evaluate our framework on a robotic block manipulation task and show that it performs better than other methods, including both sparse and dense reward functions. We also suggest some next steps and discuss how this framework makes interaction and collaboration with humans easier. |
Bharat Prakash · Nicholas Waytowich · Tim Oates · Tinoosh Mohsenin 🔗 |
-
|
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models
(
Poster
)
link »
Task planning can require defining myriad domain knowledge about the world in which a robot needs to act. To ameliorate that effort, large language models (LLMs) can be used to score potential next actions during task planning, and even generate action sequences directly, given an instruction in natural language with no additional domain information. However, such methods either require enumerating all possible next steps for scoring, or generate free-form text that may contain actions not possible on a given robot in its current context. We present a programmatic LLM prompt structure that enables plan generation functional across situated environments, robot capabilities, and tasks. Our key insight is to prompt the LLM with program-like specifications of the available actions and objects in an environment, as well as with executable example programs. We make concrete recommendations about prompt structure and generation constraints through ablation experiments, demonstrate state of the art success rates in VirtualHome household tasks, and deploy our method on a physical robot arm for tabletop tasks. |
Ishika Singh · Valts Blukis · Arsalan Mousavian · Ankit Goyal · Danfei Xu · Jonathan Tremblay · Dieter Fox · Jesse Thomason · Animesh Garg 🔗 |
-
|
Tackling AlfWorld with Action Attention and Common Sense from Language Models
(
Poster
)
link »
Pre-trained language models (LMs) capture strong prior knowledge about the world. This common sense knowledge can be used in control tasks. However, directly generating actions from LMs may result in a reasonable narrative, but not executable by a low level agent. We propose to instead use the knowledge in LMs to simplify the control problem, and assist the low-level actor training. We implement a novel question answering framework to simplify observations and an agent that handles arbitrary roll-out length and action space size based on action attention. On the Alfworld benchmark for indoor instruction following, we achieve a significantly higher success rate (50% over the baseline) with our novel object masking - action attention method. |
Yue Wu · So Yeon Min · Yonatan Bisk · Russ Salakhutdinov · Shrimai Prabhumoye 🔗 |
-
|
Collaborating with language models for embodied reasoning
(
Poster
)
link »
Reasoning in a complex and ambiguous embodied environment is a key goal for Reinforcement Learning (RL) agents. While some sophisticated RL agents can successfully solve difficult tasks, they require a large amount of training data and often struggle to generalize to new unseen environments and new tasks. On the other hand, Large Scale Language Models (LSLMs) have exhibited strong reasoning ability and the ability to to adapt to new tasks through in-context learning. However, LSLMs do not inherently have the ability to interrogate or intervene on the environment. In this work, we investigate how to combine these complementary abilities in a single system consisting of three parts: a Planner, an Actor, and a Reporter. The Planner is a pre-trained language model that can issue commands to a simple embodied agent (the Actor), while the Reporter communicates with the Planner to inform its next command. We present a set of tasks that require reasoning, test this system's ability to generalize zero-shot and investigate failure cases, and demonstrate how components of this system can be trained with reinforcement-learning to improve performance. |
Ishita Dasgupta · Christine Kaeser-Chen · Kenneth Marino · Arun Ahuja · Sheila Babayan · Felix Hill · Rob Fergus 🔗 |
-
|
How to talk so AI will learn: instructions, descriptions, and pragmatics
(
Poster
)
link »
Humans intuitively use language to express our beliefs and desires, but today we lack computational models explaining such abstract language use.To address this challenge, we consider social learning in a linear bandit setting and ask how a human might communicate preferences over behaviors (i.e. the reward function). We study two distinct types of language: instructions, which specify partial policies, and descriptions, which provide information about the reward function. To explain how humans use such language, we suggest they reason about both known present and unknown future states: instructions optimize for the present, while descriptions optimize for the future. We formalize this choice by extending reward design to consider a distribution over states.We then define a pragmatic listener agent that infers the speaker's reward function by reasoning about how the speaker expresses themselves. Simulations suggest that (1) descriptions afford stronger learning than instructions; and (2) maintaining uncertainty over the speaker's pedagogical intent allows for robust reward inference. We hope these insights facilitate a shift from developing agents that obey language to agents that learn from it. |
Theodore Sumers · Robert Hawkins · Mark Ho · Tom Griffiths · Dylan Hadfield-Menell 🔗 |
-
|
SCERL: A Benchmark for intersecting language and safe reinforcement learning
(
Poster
)
link »
The issue of safety and robustness is a critical focus for AI research. Two lines of research are so far distinct, namely (i) safe reinforcement learning, where an agent needs to interact with the world under safety constraints, and (ii) textual reinforcement learning, where agents need to perform robust reasoning and modelling of the state of the environment. In this paper, we propose Safety-Constrained Environments for Reinforcement Learning (SCERL), a benchmark to bridge the gap between these two research directions. The contribution of this benchmark is safety-relevant environments with i) a sample set of 20 games built on new logical rules to represent physical safety issues; ii) added monitoring of safety violations and iii) a mechanism to further generate a more diverse set of games with safety constraints and their corresponding metrics of safety types and difficulties. This paper shows selected baseline results on the benchmark. Our aim is for the SCERL benchmark and its flexible framework to provide a set of tasks to demonstrate language-based safety challenges to inspire the research community to further explore safety applications in a text-based domain. |
Lan Hoang · Shivam Ratnakar · Nicolas Galichet · Akifumi Wachi · Keerthiram Murugesan · Songtao Lu · Mattia Atzeni · Michael Katz · Subhajit Chaudhury 🔗 |
-
|
LAD: Language Augmented Diffusion for Reinforcement Learning
(
Poster
)
link »
Learning skills from language potentially provides a powerful avenue for generalization in RL, although it remains a challenging task as it requires agents to capture the complex interdependencies between language, actions and states, also known as language grounding. In this paper, we propose leveraging Language Augmented Diffusion models as a language-to-plan generator (LAD). We demonstrate comparable performance of LAD with the state of the art on the CALVIN benchmark with a much simpler architecture and conduct an analysis on the properties of language conditioned diffusion in reinforcement learning. |
Edwin Zhang · Yujie Lu · William Yang Wang · Amy Zhang 🔗 |
-
|
Meta-learning from demonstrations improves compositional generalization
(
Poster
)
link »
We study the problem of compositional generalization of language-instructed agents in gSCAN. gSCAN is a popular benchmark which requires an agent to generalize to instructions containing novel combinations of words, which are not seen in the training data. We propose to improve the agent’s generalization capabilities with an architecture inspired by the Meta-Sequence-to-Sequence learning approach (Lake, 2019). The agent receives as a context a few examples of pairs of instructions and action trajectories in a given instance of the environment (a support set) and it is tasked to predict an action sequence for a query instruction for the same environment instance. The context is generated by an oracle and the instructions come from the same distribution as seen in the training data. In each training episode, we also shuffle the indices of the attributes of the observed environment states and the words of the instructions to make the agent figure out the relations between the attributes and the words from the context. Our predictive model has the standard transformer architecture. We show that the proposed architecture can significantly improve the generalization capabilities of the agent on one of the most difficult gSCAN splits: the ``adverb-to-verb” split H. |
Sam Spilsbury · Alexander Ilin 🔗 |
-
|
Language-guided Task Adaptation for Imitation Learning
(
Poster
)
link »
We introduce a novel setting, wherein an agent needs to learn a task from a demonstration of a related task with the difference between the tasks communicated in natural language. The proposed setting allows reusing demonstrations from other tasks, by providing low effort language descriptions, and can also be used to provide feedback to correct agent errors, which are both important desiderata for building intelligent agents that assist humans in daily tasks. To enable progress in this proposed setting, we create two benchmarks---Room Rearrangement and Room Navigation---that cover a diverse set of task adaptations. Further, we propose a framework that uses a transformer-based model to reason about the entities in the tasks and their relationships, to learn a policy for the target task. |
Prasoon Goyal · Raymond Mooney · Scott Niekum 🔗 |
-
|
Overcoming Referential Ambiguity in language-guided goal-conditioned Reinforcement Learning
(
Poster
)
link »
Teaching an agent to perform new tasks using natural language can easily be hindered by ambiguities in interpretation. When a teacher provides an instruction to a learner about an object by referring to its features, the learner can misunderstand the teacher's intentions, for instance if the instruction ambiguously refer to features of the object, a phenomenon called referential ambiguity. We study how two concepts derived from cognitive sciences can help resolve those referential ambiguities: pedagogy (selecting the right instructions) and pragmatism (learning the preferences of the other agents using inductive reasoning). We apply those ideas to a teacher/learner setup with two artificial agents on a simulated robotic task (block-stacking). We show that these concepts improve sample efficiency for training the learner. |
Hugo Caselles-Dupré · Olivier Sigaud · Mohamed CHETOUANI 🔗 |
-
|
On the Pitfalls of Visual Learning in Referential Games
(
Poster
)
link »
This paper focuses on the effect of game design and visual representations of real-world entities on emergent languages in referential games. Strikingly, we find that the agents in such games can learn to successfully communicate even when provided with visual features from a randomly initialized neural network. Through a series of experiments, we highlight the agents' inability to effectively utilize high-level features. Using Gradient weighted-Class Activation Mapping, we verify that the agents often 'look' at regions not related to entities.Culminating with a positive result, we show how environmental pressure from agent population can nudge the learners into effectively capturing high-level visual features. |
Shresth Verma 🔗 |