Timezone: »
Interactive machine learning studies algorithms that learn from data collected through interaction with either a computational or human agent in a shared environment, through feedback on model decisions. In contrast to the common paradigm of supervised learning, IML does not assume access to pre-collected labeled data, thereby decreasing data costs. Instead, it allows systems to improve over time, empowering non-expert users to provide feedback. IML has seen wide success in areas such as video games and recommendation systems.
Although most downstream applications of NLP involve interactions with humans - e.g., via labels, demonstrations, corrections, or evaluation - common NLP models are not built to learn from or adapt to users through interaction. There remains a large research gap that must be closed to enable NLP systems that adapt on-the-fly to the changing needs of humans and dynamic environments through interaction.
Sat 7:00 a.m. - 7:05 a.m.
|
Opening Remarks
(
Introduction
)
SlidesLive Video » |
🔗 |
Sat 7:05 a.m. - 7:35 a.m.
|
Karthik Narasimhan: Semantic Supervision for few-shot generalization and personalization
(
Invited Talk
)
SlidesLive Video » A desirable feature of interactive NLP systems is the ability to receive feedback from humans and personalize to new users. Existing paradigms encounter challenges in acquiring new concepts due to the use of discrete labels and scalar rewards. As one solution to alleviate this problem, I will present our work on Semantic Supervision (SemSUP), which trains models to predict over multiple natural language descriptions of classes (or even structured ones like JSON). SemSUP can seamlessly replace any standard supervised learning setup without sacrificing any in-distribution accuracy, while providing generalization to unseen concepts and scalability to large label spaces. |
Karthik Narasimhan 🔗 |
Sat 7:35 a.m. - 8:05 a.m.
|
John Langford
(
Invited Talk
)
SlidesLive Video » |
John Langford 🔗 |
Sat 8:05 a.m. - 8:35 a.m.
|
Coffee Break
|
🔗 |
Sat 8:35 a.m. - 8:50 a.m.
|
Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
(
Contributed Talk
)
SlidesLive Video » We tackle the problem of aligning pre-trained large language models (LMs) with human preferences. If we view text generation as a sequential decision-making problem, reinforcement learning (RL) appears to be a natural conceptual framework. However, using RL for LM-based generation faces empirical challenges, including training instability due to the combinatorial action space, as well as a lack of open-source libraries and benchmarks customized for LM alignment. Thus, a question rises in the research community: is RL a practical paradigm for NLP? To help answer this, we first introduce an open-source modular library, (Reinforcement Learning for Language Models), for optimizing language generators with RL. The library consists of on-policy RL algorithms that can be used to train any encoder or encoder-decoder LM in the HuggingFace library (Wolf et al. 2020) with an arbitrary reward function. Next, we present the (General Reinforced-language Understanding Evaluation) benchmark, a set of 6 language generation tasks which are supervised not by target strings, but by reward functions which capture automated measures of human preference. GRUE is the first leaderboard-style evaluation of RL algorithms for NLP tasks. Finally, we introduce an easy-to-use, performant RL algorithm, Natural Language Policy Optimization that learns to effectively reduce the combinatorial action space in language generation. We show 1) that RL techniques are generally better than supervised methods at aligning LMs to human preferences; and 2) that NLPO exhibits greater stability and performance than previous policy gradient methods (e.g., PPO (Schulman et al. 2017)), based on both automatic and human evaluation. |
Prithviraj Ammanabrolu 🔗 |
Sat 8:50 a.m. - 9:05 a.m.
|
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
(
Contributed Talk
)
SlidesLive Video » Existing benchmarks for grounding language in interactive environments either lack real-world linguistic elements, or prove difficult to scale up due to substantial human involvement in the collection of data or feedback signals. To bridge this gap, we develop WebShop – a simulated e-commerce website environment with 1.18 million real-world products and 12, 087 crowd-sourced text instructions. Given a text instruction specifying a product requirement, an agent needs to navigate multiple types of webpages and issue diverse actions to find, customize, and purchase an item. WebShop provides several challenges for language grounding including understanding compositional instructions, query (re-)formulation, comprehending and acting on noisy text in webpages, and performing strategic exploration. |
Shunyu Yao 🔗 |
Sat 9:05 a.m. - 9:35 a.m.
|
Dan Weld: From Advice Taking to Active Learning
(
Invited Talk
)
SlidesLive Video » |
Daniel Weld 🔗 |
Sat 9:35 a.m. - 10:05 a.m.
|
Qian Yang
(
Invited Talk
)
|
Qian Yang 🔗 |
Sat 10:05 a.m. - 11:05 a.m.
|
Lunch Break
|
🔗 |
Sat 11:05 a.m. - 12:05 p.m.
|
Poster sessions
|
🔗 |
Sat 12:05 p.m. - 12:20 p.m.
|
InterFair: Debiasing with Natural Language Feedback for Fair Interpretable Predictions
(
Contributed Talk
)
SlidesLive Video » Debiasing methods in NLP models traditionally focus on isolating information related to a sensitive attribute (like gender or race). We rather argue that a favorable debiasing method should use sensitive information 'fairly,' with explanations, rather than blindly eliminating it. This fair balance is often subjective and can be challenging to achieve algorithmically. We show that an interactive setup with users enabled to provide feedback can achieve a better and fair balance between task performance and bias mitigation, supported by faithful explanations. |
Zexue He 🔗 |
Sat 12:20 p.m. - 12:35 p.m.
|
Error Detection for Interactive Text-to-SQL Semantic Parsing
(
Contributed Talk
)
SlidesLive Video » Despite remarkable progress in Text-to-SQL semantic parsing, the performance of state-of-the-art parsers are still far from perfect. At the same time, modern deep learning based Text-to-SQL parsers are often over-confident and thus casting doubt on its trustworthiness when used in an interactive setting. In this paper, we propose to train parser-agnostic error detectors for Text-to-SQL semantic parsers. We test our proposed approach with SmBop and show our model could outperform parser-dependent uncertainty measures in simulated interactive evaluations. As a result, when used for answer triggering or interaction trigger in interactive semantic parsing systems, our model could effectively improve the usability of the base parser. |
Shijie Chen 🔗 |
Sat 12:35 p.m. - 1:05 p.m.
|
Anca Dragan: Learning human preferences from language
(
Invited Talk
)
SlidesLive Video » In classic instruction following, language like "I'd like the JetBlue flight" maps to actions (e.g., selecting that flight). However, language also conveys information about a user's underlying reward function (e.g., a general preference for JetBlue), which can allow a model to carry out desirable actions in new contexts. In this talk, I'll share a model that infers rewards from language pragmatically: reasoning about how speakers choose utterances not only to elicit desired actions, but also to reveal information about their preferences. |
Anca Dragan 🔗 |
Sat 1:05 p.m. - 1:35 p.m.
|
Coffee Break
|
🔗 |
Sat 1:35 p.m. - 2:05 p.m.
|
Aida Nematzadeh: On Evaluating Neural Representations
(
Invited Talk
)
SlidesLive Video » There has been an increased interest in developing general-purpose pretrained models across different domains, such as language, vision, and multimodal. This approach is appealing because we can pretrain models on large datasets once, and then adapt them to various tasks using a smaller supervised dataset. Moreover, these models achieve impressive results on a range of benchmarks, often performing better than task-specific models. Finally, this pretraining approach processes the data passively and does not rely on actively interacting with humans. In this talk, I will first discuss what aspects of language children can learn passively and to what extent interacting with others might require developing theory of mind. Next, I discuss the need for better evaluation pipelines to better understand the shortcomings and strengths of pretrained models. In particular, I will talk about: (1) the necessity of directly measuring real-world performance (as opposed to relying on benchmark performance), (2) the importance of strong baselines, and (3) how to design probing dataset to measure certain capabilities of our models. I will focus on commonsense reasoning, verb understanding, and theory of mind as challenging domains for our existing pretrained models. |
Aida Nematzadeh 🔗 |
Sat 2:05 p.m. - 2:50 p.m.
|
Panel Discussion
(
Panel
)
SlidesLive Video » |
🔗 |
Sat 2:50 p.m. - 2:55 p.m.
|
Closing Remarks
(
Closing
)
|
🔗 |
Author Information
Kianté Brantley (The University of Maryland College Park)
Soham Dan (University of Pennsylvania)
PhD student at the University of Pennsylvania advised by Prof. Dan Roth working on machine learning and natural language processing.
Ji Ung Lee (UKP, TU Darmstadt)
Khanh Nguyen (Princeton University)
Edwin Simpson (University of Bristol)

I am a lecturer (assistant professor) at the University of Bristol, specialising in interactive machine learning for NLP and learning from crowdsourced data. Previously, I was a postdoc at TU Darmstadt, Germany, and completed my doctorate at the University of Oxford. Talk to me about uncertainty and Bayesian methods in natural language processing, learning from instructions, combining classifications, and text summarisation.
Alane Suhr (AI2)
Yoav Artzi (Cornell University)
More from the Same Authors
-
2022 : $\ell$Gym: Natural Language Visual Reasoning with Reinforcement Learning »
Anne Wu · Kianté Brantley · Noriyuki Kojima · Yoav Artzi -
2020 : Poster Session 2 (gather.town) »
Sharan Vaswani · Nicolas Loizou · Wenjie Li · Preetum Nakkiran · Zhan Gao · Sina Baghal · Jingfeng Wu · Roozbeh Yousefzadeh · Jinyi Wang · Jing Wang · Cong Xie · Anastasia Borovykh · Stanislaw Jastrzebski · Soham Dan · Yiliang Zhang · Mark Tuddenham · Sarath Pattathil · Ievgen Redko · Jeremy Cohen · Yasaman Esfandiari · Zhanhong Jiang · Mostafa ElAraby · Chulhee Yun · Michael Psenka · Robert Gower · Xiaoyu Wang -
2020 Poster: Constrained episodic reinforcement learning in concave-convex and knapsack settings »
Kianté Brantley · Miro Dudik · Thodoris Lykouris · Sobhan Miryoosefi · Max Simchowitz · Aleksandrs Slivkins · Wen Sun -
2019 : Opening Remarks »
Florian Strub · Harm de Vries · Abhishek Das · Stefan Lee · Erik Wijmans · Dor Arad Hudson · Alane Suhr -
2019 Poster: Reinforcement Learning with Convex Constraints »
Sobhan Miryoosefi · Kianté Brantley · Hal Daumé III · Miro Dudik · Robert Schapire -
2018 : Mapping Navigation Instructions to Continuous Control »
Yoav Artzi -
2018 : Poster Sessions and Lunch (Provided) »
Akira Utsumi · Alane Suhr · Ji Zhang · Ramon Sanabria · Kushal Kafle · Nicholas Chen · Seung Wook Kim · Aishwarya Agrawal · SRI HARSHA DUMPALA · Shikhar Murty · Pablo Azagra · Jean ROUAT · Alaaeldin Ali · · SUBBAREDDY OOTA · Angela Lin · Shruti Palaskar · Farley Lai · Amir Aly · Tingke Shen · Dianqi Li · Jianguo Zhang · Rita Kuznetsova · Jinwon An · Jean-Benoit Delbrouck · Tomasz Kornuta · Syed Ashar Javed · Christopher Davis · John Co-Reyes · Vasu Sharma · Sungwon Lyu · Ning Xie · Ankita Kalra · Huan Ling · Oleksandr Maksymets · Bhavana Mahendra Jain · Shun-Po Chuang · Sanyam Agarwal · Jerome Abdelnour · Yufei Feng · vincent albouy · Siddharth Karamcheti · Derek Doran · Roberta Raileanu · Jonathan Heek