Timezone: »
Building an open-domain conversational agent is a challenging problem. Current evaluation methods, mostly post-hoc judgments of static conversation, do not capture conversation quality in a realistic interactive context. In this paper, we investigate interactive human evaluation and provide evidence for its necessity; we then introduce a novel, model-agnostic, and dataset-agnostic method to approximate it. In particular, we propose a self-play scenario where the dialog system talks to itself and we calculate a combination of proxies such as sentiment and semantic coherence on the conversation trajectory. We show that this metric is capable of capturing the human-rated quality of a dialog model better than any automated metric known to-date, achieving a significant Pearson correlation (r>.7, p<.05). To investigate the strengths of this novel metric and interactive evaluation in comparison to state-of-the-art metrics and human evaluation of static conversations, we perform extended experiments with a set of models, including several that make novel improvements to recent hierarchical dialog generation architectures through sentiment and semantic knowledge distillation on the utterance level. Finally, we open-source the interactive evaluation platform we built and the dataset we collected to allow researchers to efficiently deploy and evaluate dialog models.
Author Information
Asma Ghandeharioun (MIT)
Judy Hanwen Shen (Microsoft)
Natasha Jaques (MIT)
Craig Ferguson (MIT)
Noah Jones (MIT)
Agata Lapedriza (Fundació per a la Universitat Oberta de Catalunya)
Rosalind Picard (MIT Media Lab)
More from the Same Authors
-
2023 : Improving Domain Generalization in Contrastive Learning via Domain-Aware Temperature Control »
Robert Lewis · Katie Matton · Rosalind Picard · John Guttag -
2023 : Improving Domain Generalization in Contrastive Learning Using Adaptive Temperature Control »
Katie Matton · Robert Lewis · Rosalind Picard · John Guttag -
2022 : Rosalind Picard »
Rosalind Picard -
2022 : Contrastive Learning of Electrodermal Activity Representations for Stress Detection »
Katie Matton · Robert Lewis · John Guttag · Rosalind Picard -
2021 : Context in Automated Affect Recognition »
Matt Groh · Rosalind Picard -
2021 Poster: Environment Generation for Zero-Shot Compositional Reinforcement Learning »
Izzeddin Gur · Natasha Jaques · Yingjie Miao · Jongwook Choi · Manoj Tiwari · Honglak Lee · Aleksandra Faust -
2020 : Panel: Kate Larson (DeepMind) [moderator], Natasha Jaques (Google), Jeffrey Rosenschein (The Hebrew University of Jerusalem), Michael Wooldridge (University of Oxford) »
Kate Larson · Natasha Jaques · Jeffrey S Rosenschein · Michael Wooldridge -
2020 : Q&A: James Fearon (Stanford University): Cooperation Inside and Over the Rules of the Game, with Natasha Jaques (Google) [moderator] »
James Fearon · Natasha Jaques -
2020 : Q&A: Sarit Kraus (Bar-Ilan University): Agent-Human Collaboration and Learning for Improving Human Satisfaction, with Natasha Jaques (Google) [moderator] »
Sarit Kraus · Natasha Jaques -
2020 : Q&A: Peter Stone (The University of Texas at Austin): Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination, with Natasha Jaques (Google) [moderator] »
Peter Stone · Natasha Jaques -
2020 : Q&A: William Isaac (DeepMind): Can Cooperative Make AI (and Society) Fairer?, with Natasha Jaques (Google) [moderator] »
William Isaac · Natasha Jaques -
2020 : Q&A: Gillian Hadfield (University of Toronto): The Normative Infrastructure of Cooperation, with Natasha Jaques (Google) [moderator] »
Gillian Hadfield · Natasha Jaques -
2020 : Q&A: Open Problems in Cooperative AI with Thore Graepel (DeepMind), Allan Dafoe (University of Oxford), Yoram Bachrach (DeepMind), and Natasha Jaques (Google) [moderator] »
Thore Graepel · Yoram Bachrach · Allan Dafoe · Natasha Jaques -
2020 Poster: Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design »
Michael Dennis · Natasha Jaques · Eugene Vinitsky · Alexandre Bayen · Stuart Russell · Andrew Critch · Sergey Levine -
2020 Oral: Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design »
Michael Dennis · Natasha Jaques · Eugene Vinitsky · Alexandre Bayen · Stuart Russell · Andrew Critch · Sergey Levine -
2019 Workshop: Emergent Communication: Towards Natural Language »
Abhinav Gupta · Michael Noukhovitch · Cinjon Resnick · Natasha Jaques · Angelos Filos · Marie Ossenkopf · Angeliki Lazaridou · Jakob Foerster · Ryan Lowe · Douwe Kiela · Kyunghyun Cho -
2018 : Lunch »
Hong Yu · Bhanu Pratap Singh Rawat · Arijit Ukil · Waheeda Saib · Jekaterina Novikova · John Hughes · Yuhui Zhang · Rahul V · Mi Jung Kim · Babak Taati · Hariharan Ravishankar · Harry Clifford · Hirofumi Kobayashi · Babak Taati · Keyang Xu · Yen-Chi Cheng · Timothy Cannings · Jayashree Kalpathy-Cramer · Jayashree Kalpathy-Cramer · Parinaz Sobhani · Kimis Perros · Wei-Hung Weng · Yordan Raykov · Lars Lorch · Mengqi Jin · Xue Teng · Michael Ferlaino · Marek Rei · Cédric Beaulac · Aman Verma · Sebastian Keller · Edmond Cunningham · Luc Evers · Victor Rodriguez · Vipul Satone · Dianbo Liu · Angeline Yasodhara · Geoff Tison · Ligin Solamen · Bryan He · Rahul Ladhania · Yipeng Shi · Md Nafiz Hamid · Pouria Mashouri · Woochan Hwang · Sejin Park · Xu Chen · Rachneet Kaur · Davis Blalock · Holly Wiberg · Parminder Bhatia · Kezi Yu · RUMENG LI · Jun Sakuma · Charles Ding · Aaron Babier · Yong Cai · A Pratap · Luke O'Connor · Allen Nie · Martin Kang · Ian Covert · Xun Wang · Zelun Luo · Serena Yeung · William Boag · Kazuki Tachikawa · Mary Saltz · Owen Lahav · Edward Lee · Eric Teasley · Michael Kamp · Nirmesh Patel · Vishwali Mhasawade · Maxim Samarin · Ryo Uchimido · Farzad Khalvati · Francisco Cruz · Laura Symul · Zaid Nabulsi · Mads Mihailescu · Rosalind Picard -
2016 Demonstration: Interactive musical improvisation with Magenta »
Adam Roberts · Jesse Engel · Curtis Hawthorne · Ian Simon · Elliot Waite · Sageev Oore · Natasha Jaques · Cinjon Resnick · Douglas Eck