Timezone: »
Precisely assessing the progress in natural language generation (NLG) tasks is challenging, and human evaluation is often necessary.However, human evaluation is usually costly, difficult to reproduce, and non-reusable.In this paper, we propose a new and simple automatic evaluation method for NLG called Near-Negative Distinction (NND) that repurposes prior human annotations into NND tests.To pass an NND test, an NLG model must place a higher likelihood on a high-quality output candidate than on a near-negative candidate with a known error.Model performance is established by the number of NND tests a model passes, as well as the distribution over task-specific errors the model fails on.Through experiments on three NLG tasks (question generation, question answering, and summarization), we show that NND achieves a higher correlation with human judgments than standard NLG evaluation metrics. We invite the community to adopt NND as a generic method for NLG evaluation and contribute new NND test collections.
Author Information
Philippe Laban (Salesforce.com)

Philippe is Research Scientist at Salesforce Research, New York. Previously he completed his Ph.D. in Computer Science at UC Berkeley, advised by Marti Hearst and John Canny.
Chien-Sheng Wu (Salesforce Research)
Wenhao Liu (Salesforce inc.)
Caiming Xiong (Salesforce Research)
More from the Same Authors
-
2022 : Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems »
Yihao Feng · Shentao Yang · Shujian Zhang · Jianguo Zhang · Caiming Xiong · Mingyuan Zhou · Huan Wang -
2022 : Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems »
Yihao Feng · Shentao Yang · Shujian Zhang · Jianguo Zhang · Caiming Xiong · Mingyuan Zhou · Huan Wang -
2022 Spotlight: Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization »
Devansh Arpit · Huan Wang · Yingbo Zhou · Caiming Xiong -
2022 Spotlight: Lightning Talks 5B-1 »
Devansh Arpit · Xiaojun Xu · Zifan Shi · Ivan Skorokhodov · Shayan Shekarforoush · Zhan Tong · Yiqun Wang · Shichong Peng · Linyi Li · Ivan Skorokhodov · Huan Wang · Yibing Song · David Lindell · Yinghao Xu · Seyed Alireza Moazenipourasil · Sergey Tulyakov · Peter Wonka · Yiqun Wang · Ke Li · David Fleet · Yujun Shen · Yingbo Zhou · Bo Li · Jue Wang · Peter Wonka · Marcus Brubaker · Caiming Xiong · Limin Wang · Deli Zhao · Qifeng Chen · Dit-Yan Yeung -
2022 Poster: Policy Optimization for Markov Games: Unified Framework and Faster Convergence »
Runyu Zhang · Qinghua Liu · Huan Wang · Caiming Xiong · Na Li · Yu Bai -
2022 Poster: Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization »
Devansh Arpit · Huan Wang · Yingbo Zhou · Caiming Xiong