Timezone: »
Despite recent advancements in language models (LMs), their application to dialogue management (DM) problems and ability to carry on rich conversations remain a challenge. We use reinforcement learning (RL) to develop a dialogue agent that avoids being short-sighted (outputting generic utterances) and maximizes overall user satisfaction. Most existing RL approaches to DM train the agent at the word-level, and thus, have to deal with a combinatorially complex action space even for a medium-size vocabulary. As a result, they struggle to produce a successful and engaging dialogue even if they are warm-started with a pre-trained LM. To address this issue, we develop a RL-based DM using a novel mixture of expert language model (MoE-LM) that consists of (i) a LM capable of learning diverse semantics for conversation histories, (ii) a number of specialized LMs (or experts) capable of generating utterances corresponding to a particular attribute or personality, and (iii) a RL-based DM that performs dialogue planning with the utterances generated by the experts. Our MoE approach provides greater flexibility to generate sensible utterances with different intents and allows RL to focus on conversational-level DM. We compare it with SOTA baselines on open-domain dialogues and demonstrate its effectiveness both in terms of the diversity and sensibility of the generated utterances and the overall DM performance.
Author Information
Yinlam Chow (Google Research)
Azamat Tulepbergenov (Google)
I am a Research Software Engineer at Google AI. I am interested in applying Reinforcement Learning to help solve useful problems. My current research interests are language, health and urban engineering.
Ofir Nachum (Google Brain)
Dhawal Gupta (University of Massachusetts Amherst)
Moonkyung Ryu (Google)
Mohammad Ghavamzadeh (Google Research)
Craig Boutilier (Google)
More from the Same Authors
-
2021 Spotlight: Safe Reinforcement Learning with Natural Language Constraints »
Tsung-Yen Yang · Michael Y Hu · Yinlam Chow · Peter J. Ramadge · Karthik Narasimhan -
2021 : Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions »
Bogdan Mazoure · Ilya Kostrikov · Ofir Nachum · Jonathan Tompson -
2021 : TRAIL: Near-Optimal Imitation Learning with Suboptimal Data »
Mengjiao (Sherry) Yang · Sergey Levine · Ofir Nachum -
2021 : Why so pessimistic? Estimating uncertainties for offline rl through ensembles, and why their independence matters »
Kamyar Ghasemipour · Shixiang (Shane) Gu · Ofir Nachum -
2022 : Multi-Environment Pretraining Enables Transfer to Action Limited Datasets »
David Venuto · Mengjiao (Sherry) Yang · Pieter Abbeel · Doina Precup · Igor Mordatch · Ofir Nachum -
2022 : Contrastive Value Learning: Implicit Models for Simple Offline RL »
Bogdan Mazoure · Benjamin Eysenbach · Ofir Nachum · Jonathan Tompson -
2022 Workshop: Foundation Models for Decision Making »
Mengjiao (Sherry) Yang · Yilun Du · Jack Parker-Holder · Siddharth Karamcheti · Igor Mordatch · Shixiang (Shane) Gu · Ofir Nachum -
2022 Poster: Private and Communication-Efficient Algorithms for Entropy Estimation »
Gecia Bravo-Hermsdorff · Róbert Busa-Fekete · Mohammad Ghavamzadeh · Andres Munoz Medina · Umar Syed -
2022 Poster: Oracle Inequalities for Model Selection in Offline Reinforcement Learning »
Jonathan N Lee · George Tucker · Ofir Nachum · Bo Dai · Emma Brunskill -
2022 Poster: Chain of Thought Imitation with Procedure Cloning »
Mengjiao (Sherry) Yang · Dale Schuurmans · Pieter Abbeel · Ofir Nachum -
2022 Poster: Multi-Game Decision Transformers »
Kuang-Huei Lee · Ofir Nachum · Mengjiao (Sherry) Yang · Lisa Lee · Daniel Freeman · Sergio Guadarrama · Ian Fischer · Winnie Xu · Eric Jang · Henryk Michalewski · Igor Mordatch -
2022 Poster: Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters »
Kamyar Ghasemipour · Shixiang (Shane) Gu · Ofir Nachum -
2022 Poster: Robust Reinforcement Learning using Offline Data »
Kishan Panaganti · Zaiyan Xu · Dileep Kalathil · Mohammad Ghavamzadeh -
2022 Poster: Operator Splitting Value Iteration »
Amin Rakhsha · Andrew Wang · Mohammad Ghavamzadeh · Amir-massoud Farahmand -
2022 Poster: Improving Zero-Shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions »
Bogdan Mazoure · Ilya Kostrikov · Ofir Nachum · Jonathan Tompson -
2022 Poster: Efficient Risk-Averse Reinforcement Learning »
Ido Greenberg · Yinlam Chow · Mohammad Ghavamzadeh · Shie Mannor -
2021 Poster: Adaptive Sampling for Minimax Fair Classification »
Shubhanshu Shekhar · Greg Fields · Mohammad Ghavamzadeh · Tara Javidi -
2021 Poster: Safe Reinforcement Learning with Natural Language Constraints »
Tsung-Yen Yang · Michael Y Hu · Yinlam Chow · Peter J. Ramadge · Karthik Narasimhan -
2021 Poster: Structural Credit Assignment in Neural Networks using Reinforcement Learning »
Dhawal Gupta · Gabor Mihucz · Matthew Schlegel · James Kostas · Philip Thomas · Martha White -
2020 Poster: Differentiable Meta-Learning of Bandit Policies »
Craig Boutilier · Chih-wei Hsu · Branislav Kveton · Martin Mladenov · Csaba Szepesvari · Manzil Zaheer -
2020 Poster: Latent Bandits Revisited »
Joey Hong · Branislav Kveton · Manzil Zaheer · Yinlam Chow · Amr Ahmed · Craig Boutilier -
2020 Poster: Online Planning with Lookahead Policies »
Yonathan Efroni · Mohammad Ghavamzadeh · Shie Mannor -
2020 Session: Orals & Spotlights Track 09: Reinforcement Learning »
Pulkit Agrawal · Mohammad Ghavamzadeh -
2020 Poster: CoinDICE: Off-Policy Confidence Interval Estimation »
Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Spotlight: CoinDICE: Off-Policy Confidence Interval Estimation »
Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster Session »
Matthia Sabatelli · Adam Stooke · Amir Abdi · Paulo Rauber · Leonard Adolphs · Ian Osband · Hardik Meisheri · Karol Kurach · Johannes Ackermann · Matt Benatan · GUO ZHANG · Chen Tessler · Dinghan Shen · Mikayel Samvelyan · Riashat Islam · Murtaza Dalal · Luke Harries · Andrey Kurenkov · Konrad Żołna · Sudeep Dasari · Kristian Hartikainen · Ofir Nachum · Kimin Lee · Markus Holzleitner · Vu Nguyen · Francis Song · Christopher Grimm · Felipe Leno da Silva · Yuping Luo · Yifan Wu · Alex Lee · Thomas Paine · Wei-Yang Qu · Daniel Graves · Yannis Flet-Berliac · Yunhao Tang · Suraj Nair · Matthew Hausknecht · Akhil Bagaria · Simon Schmitt · Bowen Baker · Paavo Parmas · Benjamin Eysenbach · Lisa Lee · Siyu Lin · Daniel Seita · Abhishek Gupta · Riley Simmons-Edler · Yijie Guo · Kevin Corder · Vikash Kumar · Scott Fujimoto · Adam Lerer · Ignasi Clavera Gilaberte · Nicholas Rhinehart · Ashvin Nair · Ge Yang · Lingxiao Wang · Sungryull Sohn · J. Fernando Hernandez-Garcia · Xian Yeow Lee · Rupesh Srivastava · Khimya Khetarpal · Chenjun Xiao · Luckeciano Carvalho Melo · Rishabh Agarwal · Tianhe Yu · Glen Berseth · Devendra Singh Chaplot · Jie Tang · Anirudh Srinivasan · Tharun Kumar Reddy Medini · Aaron Havens · Misha Laskin · Asier Mujika · Rohan Saphal · Joseph Marino · Alex Ray · Joshua Achiam · Ajay Mandlekar · Zhuang Liu · Danijar Hafner · Zhiwen Tang · Ted Xiao · Michael Walton · Jeff Druce · Ferran Alet · Zhang-Wei Hong · Stephanie Chan · Anusha Nagabandi · Hao Liu · Hao Sun · Ge Liu · Dinesh Jayaraman · John Co-Reyes · Sophia Sanborn -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 : Contributed Talks »
Kevin Lu · Matthew Hausknecht · Ofir Nachum -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Workshop: Safety and Robustness in Decision-making »
Mohammad Ghavamzadeh · Shie Mannor · Yisong Yue · Marek Petrik · Yinlam Chow -
2019 Poster: DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections »
Ofir Nachum · Yinlam Chow · Bo Dai · Lihong Li -
2019 Spotlight: DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections »
Ofir Nachum · Yinlam Chow · Bo Dai · Lihong Li -
2018 Poster: Non-delusional Q-learning and value-iteration »
Tyler Lu · Dale Schuurmans · Craig Boutilier -
2018 Oral: Non-delusional Q-learning and value-iteration »
Tyler Lu · Dale Schuurmans · Craig Boutilier -
2018 Poster: A Lyapunov-based Approach to Safe Reinforcement Learning »
Yinlam Chow · Ofir Nachum · Edgar Duenez-Guzman · Mohammad Ghavamzadeh -
2018 Poster: A Block Coordinate Ascent Algorithm for Mean-Variance Optimization »
Tengyang Xie · Bo Liu · Yangyang Xu · Mohammad Ghavamzadeh · Yinlam Chow · Daoming Lyu · Daesub Yoon -
2018 Poster: Data center cooling using model-predictive control »
Nevena Lazic · Craig Boutilier · Tyler Lu · Eehern Wong · Binz Roy · Moonkyung Ryu · Greg Imwalle -
2017 Poster: Bridging the Gap Between Value and Policy Based Reinforcement Learning »
Ofir Nachum · Mohammad Norouzi · Kelvin Xu · Dale Schuurmans