Timezone: »

Wordplay: Reinforcement and Language Learning in Text-based Games
Adam Trischler · Angeliki Lazaridou · Yonatan Bisk · Wendy Tay · Nate Kushman · Marc-Alexandre Côté · Alessandro Sordoni · Daniel Ricks · Tom Zahavy · Hal Daumé III

Sat Dec 08 05:00 AM -- 03:30 PM (PST) @ Room 512 ABEF

Video games, via interactive learning environments like ALE [Bellemare et al., 2013], have been fundamental to the development of reinforcement learning algorithms that work on raw video inputs rather than featurized representations. Recent work has shown that text-based games may present a similar opportunity to develop RL algorithms for natural language inputs [Narasimhan et al., 2015, Haroush et al., 2018]. Drawing on insights from both the RL and NLP communities, this workshop will explore this opportunity, considering synergies between text-based and video games as learning environments as well as important differences and pitfalls.

Video games provide infinite worlds of interaction and grounding defined by simple, physics-like dynamics. While it is difficult, if not impossible, to simulate the full and social dynamics of linguistic interaction (see, e.g., work on user simulation and dialogue [Georgila et al., 2006, El Asri et al., 2016]), text-based games nevertheless present complex, interactive simulations that ground language in world and action semantics. Games like Zork [Infocom, 1980] rose to prominence in the age before advanced computer graphics. They use simple language to describe the state of the environment and to report the effects of player actions. Players interact with the environment through text commands that respect a predefined grammar, which, though simplistic, must be discovered in each game. Through sequential decision making, language understanding, and language generation, players work toward goals that may or may not be specified explicitly, and earn rewards (points) at completion or along the way.

Text-based games present a broad spectrum of challenges for learning algorithms. In addition to language understanding, successful play generally requires long-term memory and planning, exploration/experimentation, affordance extraction [Fulda et al., 2017], and common sense. Text games also highlight major open challenges for RL: the action space (text) is combinatorial and compositional, while game states are partially observable, since text is often ambiguous or underspecific. Furthermore, in text games the set of actions that affect the state is not known in advance but must be learned through experimentation, typically informed by prior world/linguistic knowledge.

There has been a host of recent work towards solving text games [Narasimhan et al., 2015, Fulda et al., 2017, Kostka et al., 2017, Zhilin, et al., 2017, Haroush et al., 2018]. Nevertheless, commercial games like Zork remain beyond the capabilities of existing approaches. We argue that addressing even a subset of the aforementioned challenges would represent important progress in machine learning. Agents that solve text-based games may further learn functional properties of language; however, it is unclear what limitations the constraints and simplifications of text games (e.g., on linguistic diversity) impose on agents trained to solve them.

This workshop will highlight research that investigates existing or novel RL techniques for text-based settings, what agents that solve text-based games (might) learn about language, and more generally whether text-based games provide a good testbed for research at the intersection of RL and NLP. The program will feature a collection of invited talks alongside contributed posters and spotlight talks, curated by a committee with broad coverage of the RL and NLP communities. Panel discussions will highlight perspectives of influential researchers from both fields and encourage open dialogue. We will also pose a text-based game challenge several months in advance of the workshop (a similar competition is held annually at the IEEE Conference on Computational Intelligence and Games). This optional component will enable participants to design, train, and test agents in a carefully constructed, interactive text environment. The best-performing agent(s) will be recognized and discussed at the workshop. In addition to the exchange of ideas and the initiation of collaboration, an expected outcome is that text-based games emerge more prominently as a benchmark task to bridge RL and NLP research.

Relevant topics to be addressed at the workshop include (but are not limited to):
- RL in compositional, combinatorial action spaces
- Open RL problems that are especially pernicious in text-based games, like (sub)goal identification and efficient experimentation
- Grounded language understanding
- Online language acquisition
- Affordance extraction (on the fly)
- Language generation and evaluation in goal-oriented settings
- Automatic or crowdsourcing methods for linguistic diversity in simulations
- Use of language to constrain or index RL policies [Andreas et al., 2017]

Sat 5:30 a.m. - 5:40 a.m.
Opening Remarks (Introduction)
Adam Trischler
Sat 5:40 a.m. - 6:20 a.m.

We describe new work that connects two separate threads of our previous research: (i) situated language learning in text adventure games such as (Bordes et al., AISTATS 2010) and (Weston et al, ICLR 2016); and (ii) non-situated dialogue agents such as in the recent PersonaChat dataset (Zhang et al, ACL 2018). The resulting approach aims to develop embodied agents with personas that can both act and speak, where the situated dialogue involves real language between models and humans that can be grounded within the game.

Jason Weston
Sat 6:20 a.m. - 6:40 a.m.

Text-based adventure games provide a platform on which to explore reinforcement learning in the context of a combinatorial action space, such as natural language. We present a deep reinforcement learning architecture that represents the game state as a knowledge graph which is learned during exploration. This graph is used to prune the action space, enabling more efficient exploration. The question of which action to take can be reduced to a question-answering task, a form of transfer learning that pre-trains certain parts of our architecture. In experiments using the TextWorld framework, we show that our proposed technique can learn a control policy faster than baseline alternatives.

Prithviraj Ammanabrolu
Sat 6:40 a.m. - 7:20 a.m.

As inherently linguistic creatures, we tend to think of text as a simple domain: After all, there are only twenty-six letters in the English language, and basic tasks like keyword recognition and part-of-speech tagging have been routinely applied in industry for more than a decade. But words are not a domain in and of themselves. Rather, they function as abstract representations for other types of input, resulting in daunting levels of complexity. This presentation discusses some of the challenges presented by language tasks in general and by text-based games in particular, including partially observable state spaces, compositional and combinatorial action spaces, word-sense disambiguation, consumable rewards, and goal-directed inference.

Sat 7:30 a.m. - 8:00 a.m.
Coffee Break 1 (Break)
Sat 8:00 a.m. - 8:20 a.m.

To solve a text-based game, an agent needs to formulate valid text commands for a given context and find the one that leads to success. Recent attempts at solving text-based games with deep reinforcement learning have focused on the latter, i.e., learning to act optimally when valid actions are known in advance. In this work, we propose to tackle the first task and train a model that generates the set of all valid commands for a given context. We try three generative models on a dataset generated with Textworld (Côté et al., 2018). The best model can generate valid commands which were unseen at training and achieve high F1 score on the test set.

David Tao
Sat 8:20 a.m. - 9:00 a.m.

AI-driven characters that learn directly from human input are rare in digital games, but recent advances in several fields of machine learning suggests that they may soon be much more feasible to create. This study explores the design space for interacting with such a character through natural language text dialogue. We conducted an observational study with 18 high school students, who played Minecraft alongside a Wizard of Oz prototype of a companion AI character that learned from their actions and inputs. In this paper, we report on an analysis of the 186 natural language messages that players sent to the character, and review key variations in syntax, function and writing style. We find that players' behaviour and language was differentiated by the extent to which they expressed an anthropomorphic view of the AI character and the level of interest that they showed in interacting with it.

Katja Hofmann
Sat 9:00 a.m. - 10:20 a.m.
Lunch (Break)
Sat 10:20 a.m. - 11:00 a.m.

Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts. Here, we introduce the BabyAI research platform to support investigations towards including humans in the loop for grounded language learning. The BabyAI platform comprises an extensible suite of 19 levels of increasing difficulty. The levels gradually lead the agent towards acquiring a combinatorially rich synthetic language which is a proper subset of English. The platform also provides a heuristic expert agent for the purpose of simulating a human teacher. We report baseline results and estimate the amount of human involvement that would be required to train a neural network-based agent on some of the BabyAI levels. We put forward strong evidence that current deep learning methods are not yet sufficiently sample efficient when it comes to learning a language with compositional properties.

Maxime Chevalier-Boisvert
Sat 11:00 a.m. - 11:20 a.m.

Interactive fiction (IF) games present very different challenges than the vision and control-based games that learning agents have previously excelled at. Solving IF games requires human-like language understanding, commonsense reasoning, planning, and deduction skills. This paper provides a testbed for rapid development of new agents that exhibit these skills by introducing Jericho, a fast, fully-featured interface to fifty-six popular and challenging IF games. We also present initial work towards solving these games in the form of an agent that won the 2018 Text-Based Adventure AI Competition. Finally, we conduct a comprehensive evaluation between NAIL, our agent, and several other IF agents in a richer set of text game environments, and point to directions in which agents can improve. We are optimistic that tools such as Jericho and NAIL will help the community make progress towards language-understanding agents.

Matthew Hausknecht, Charles Li Chen
Sat 11:20 a.m. - 12:00 p.m.

For most of the statistical ML era, the areas of computational linguistics and reinforcement learning (RL) have been studied separately. With the rise of deep learning, we now have tools that can leverage large amounts of data across multiple modalities. In this talk, I make the case for building holistic AI systems that learn by simultaneously utilizing signals from both language and environmental feedback. While RL has been used in recent work to help understand language, I will demonstrate that language can also help agents learn control policies that generalize over domains. Developing agents that can efficiently harness this synergy between language understanding and policy learning will be crucial for our progress towards stronger AI systems.

Karthik Narasimhan
Sat 12:00 p.m. - 12:30 p.m.
Coffee Break 2 (Break)
Sat 12:30 p.m. - 1:10 p.m.
On the role of text-based games for language learning and RL (Discussion Panel)
Sat 1:10 p.m. - 1:50 p.m.

As in many complex text-based scenarios, a conversation can often be decomposed into multiple parts, each taking care of a subtopic or subtask that contributes to the success of the whole dialogue. An example is a travel assistant, which can converse with a user to deal with subtasks like hotel reservation, air ticket purchase, etc. In this talk, we will show how hierarchical deep reinforcement learning can be a useful framework for managing such "composite-task dialogues": (1) more efficient policy optimization with given subtasks; and (2) discovery of dialogue subtasks from corpus in an unsupervised way.

Lihong Li
Sat 1:50 p.m. - 2:10 p.m.
Introducing "First TextWorld Problems": a text-based game competition (Demonstration)
Marc-Alexandre Côté
Sat 2:10 p.m. - 2:20 p.m.
Closing Remarks (Conclusion)

Author Information

Adam Trischler (Microsoft)
Angeliki Lazaridou (DeepMind)
Yonatan Bisk (University of Washington)
Wendy Tay (Microsoft)
Nate Kushman (Microsoft Research Cambridge)
Marc-Alexandre Côté (Microsoft Research)
Alessandro Sordoni (Microsoft Research Montreal)
Daniel Ricks (Brigham Young University)
Tom Zahavy (The Technion)
Hal Daumé III (Univ of Maryland / Microsoft Research)

More from the Same Authors