Timezone: »
Information-Theoretic Evaluation of Free-Text Rationales with Conditional $\mathcal{V}$-Information
Hanjie Chen · Faeze Brahman · Xiang Ren · Yangfeng Ji · Yejin Choi · Swabha Swayamdipta
Event URL: https://openreview.net/forum?id=60s0lDM0cPF »
Free-text rationales are a promising step towards explainable AI, yet their evaluation remains an open research problem. While existing metrics have mostly focused on measuring the direct association between the rationale and a given label, we argue that an ideal metric should also be able to focus on the new information uniquely provided in the rationale that is otherwise not provided in the input or the label. We investigate this research problem from an information-theoretic perspective using the conditional $\mathcal{V}$-information \citep{hewitt-etal-2021-conditional}. More concretely, we propose a metric called REV (Rationale Evaluation with conditional $\mathcal{V}$-information), that can quantify the new information in a rationale supporting a given label beyond the information already available in the input or the label. Experiments on reasoning tasks across four benchmarks, including few-shot prompting with GPT-3, demonstrate the effectiveness of REV in evaluating different types of rationale-label pairs, compared to existing metrics. Through several quantitative comparisons, we demonstrate the capability of REV in providing more sensitive measurements of new information in free-text rationales with respect to a label. Furthermore, REV is consistent with human judgments on rationale evaluations. Overall, when used alongside traditional performance metrics, REV provides deeper insights into a models' reasoning and prediction processes.
Free-text rationales are a promising step towards explainable AI, yet their evaluation remains an open research problem. While existing metrics have mostly focused on measuring the direct association between the rationale and a given label, we argue that an ideal metric should also be able to focus on the new information uniquely provided in the rationale that is otherwise not provided in the input or the label. We investigate this research problem from an information-theoretic perspective using the conditional $\mathcal{V}$-information \citep{hewitt-etal-2021-conditional}. More concretely, we propose a metric called REV (Rationale Evaluation with conditional $\mathcal{V}$-information), that can quantify the new information in a rationale supporting a given label beyond the information already available in the input or the label. Experiments on reasoning tasks across four benchmarks, including few-shot prompting with GPT-3, demonstrate the effectiveness of REV in evaluating different types of rationale-label pairs, compared to existing metrics. Through several quantitative comparisons, we demonstrate the capability of REV in providing more sensitive measurements of new information in free-text rationales with respect to a label. Furthermore, REV is consistent with human judgments on rationale evaluations. Overall, when used alongside traditional performance metrics, REV provides deeper insights into a models' reasoning and prediction processes.
Author Information
Hanjie Chen (University of Virginia)
Faeze Brahman (Allen Institute for AI)
Xiang Ren (University of Southern California)
Yangfeng Ji (University of Virginia)
Yejin Choi (University of Washington)
Swabha Swayamdipta (University of Southern California)
More from the Same Authors
-
2020 : Poster #2 »
Xiang Ren -
2021 : CommonsenseQA 2.0: Exposing the Limits of AI through Gamification »
Alon Talmor · Ori Yoran · Ronan Le Bras · Chandra Bhagavatula · Yoav Goldberg · Yejin Choi · Jonathan Berant -
2021 : NaturalProofs: Mathematical Theorem Proving in Natural Language »
Sean Welleck · Jiacheng Liu · Ronan Le Bras · Hanna Hajishirzi · Yejin Choi · Kyunghyun Cho -
2021 Spotlight: Refining Language Models with Compositional Explanations »
Huihan Yao · Ying Chen · Qinyuan Ye · Xisen Jin · Xiang Ren -
2021 : Adversarial Training for Improving Model Robustness? Look at Both Prediction and Interpretation »
Hanjie Chen · Yangfeng Ji -
2021 : Towards Grounded Natural Language Proof Generation »
Sean Welleck · Jiacheng Liu · Yejin Choi -
2022 : PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales »
Peifeng Wang · Aaron Chan · Filip Ilievski · Muhao Chen · Xiang Ren -
2022 : Explaining Predictive Uncertainty by Looking Back at Model Explanations »
Hanjie Chen · Wanyu Du · Yangfeng Ji -
2022 : Adaptive Pre-training of Language Models for Better Logical Reasoning »
Soumya Sanyal · Yichong Xu · Shuohang Wang · Ziyi Yang · Reid Pryzant · Wenhao Yu · Chenguang Zhu · Xiang Ren -
2022 : SPRINT: Scalable Semantic Policy Pre-training via Language Instruction Relabeling »
Jesse Zhang · Karl Pertsch · Jiahui Zhang · Taewook Nam · Sung Ju Hwang · Xiang Ren · Joseph Lim -
2022 : PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales »
Peifeng Wang · Aaron Chan · Filip Ilievski · Muhao Chen · Xiang Ren -
2022 : SPRINT: Scalable Semantic Policy Pre-training via Language Instruction Relabeling »
Jesse Zhang · Karl Pertsch · Jiahui Zhang · Taewook Nam · Sung Ju Hwang · Xiang Ren · Joseph Lim -
2023 : Resource-rational moral judgment »
Sarah Wu · Xiang Ren · Sydney Levine -
2023 : Instruction-following Evaluation through Verbalizer Manipulation »
Shiyang Li · Jun Yan · Hai Wang · Zheng Tang · Xiang Ren · Vijay Srinivasan · Hongxia Jin -
2023 : URIAL: Tuning-Free Instruction Learning and Alignment for Untuned LLMs »
Bill Yuchen Lin · Abhilasha Ravichander · Ximing Lu · Nouha Dziri · Melanie Sclar · Khyathi Chandu · Chandra Bhagavatula · Yejin Choi -
2023 : Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection »
Jun Yan · Vikas Yadav · Shiyang Li · Lichang Chen · Zheng Tang · Hai Wang · Vijay Srinivasan · Xiang Ren · Hongxia Jin -
2023 Workshop: AI meets Moral Philosophy and Moral Psychology: An Interdisciplinary Dialogue about Computational Ethics »
Sydney Levine · Liwei Jiang · Jared Moore · Zhijing Jin · Yejin Choi -
2023 Poster: RealTime QA: What's the Answer Right Now? »
Jungo Kasai · Keisuke Sakaguchi · yoichi takahashi · Ronan Le Bras · Akari Asai · Xinyan Yu · Dragomir Radev · Noah Smith · Yejin Choi · Kentaro Inui -
2023 Poster: Localized Symbolic Knowledge Distillation for Visual Commonsense Models »
Jae Sung Park · Jack Hessel · Khyathi Chandu · Paul Pu Liang · Ximing Lu · Peter West · Youngjae Yu · Qiuyuan Huang · Jianfeng Gao · Ali Farhadi · Yejin Choi -
2023 Poster: SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks »
Bill Yuchen Lin · Yicheng Fu · Karina Yang · Faeze Brahman · Shiyu Huang · Chandra Bhagavatula · Prithviraj (Raj) Ammanabrolu · Yejin Choi · Xiang Ren -
2023 Poster: Faith and Fate: Limits of Transformers on Compositionality »
Nouha Dziri · Ximing Lu · Melanie Sclar · Xiang (Lorraine) Li · Liwei Jiang · Bill Yuchen Lin · Sean Welleck · Peter West · Chandra Bhagavatula · Ronan Le Bras · Jena Hwang · Soumya Sanyal · Xiang Ren · Allyson Ettinger · Zaid Harchaoui · Yejin Choi -
2023 Poster: Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text »
Wanrong Zhu · Jack Hessel · Anas Awadalla · Samir Yitzhak Gadre · Jesse Dodge · Alex Fang · Youngjae Yu · Ludwig Schmidt · William Yang Wang · Yejin Choi -
2023 Tutorial: Data Contribution Estimation for Machine Learning »
Stephanie Schoch · Ruoxi Jia · Yangfeng Ji -
2022 Poster: COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics »
Lianhui Qin · Sean Welleck · Daniel Khashabi · Yejin Choi -
2022 Poster: NS3: Neuro-symbolic Semantic Code Search »
Shushan Arakelyan · Anna Hakhverdyan · Miltiadis Allamanis · Luis Garcia · Christophe Hauser · Xiang Ren -
2022 Poster: CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification »
Stephanie Schoch · Haifeng Xu · Yangfeng Ji -
2022 Poster: Unsupervised Cross-Task Generalization via Retrieval Augmentation »
Bill Yuchen Lin · Kangmin Tan · Chris Miller · Beiwen Tian · Xiang Ren -
2022 Poster: QUARK: Controllable Text Generation with Reinforced Unlearning »
Ximing Lu · Sean Welleck · Jack Hessel · Liwei Jiang · Lianhui Qin · Peter West · Prithviraj Ammanabrolu · Yejin Choi -
2022 Poster: NaturalProver: Grounded Mathematical Proof Generation with Language Models »
Sean Welleck · Jiacheng Liu · Ximing Lu · Hannaneh Hajishirzi · Yejin Choi -
2021 : Panel Discussion »
Pascal Poupart · Ali Ghodsi · Luke Zettlemoyer · Sameer Singh · Kevin Duh · Yejin Choi · Lu Hou -
2021 : Battling with Larger Models through Grounding and Searching »
Yejin Choi -
2021 Oral: MERLOT: Multimodal Neural Script Knowledge Models »
Rowan Zellers · Ximing Lu · Jack Hessel · Youngjae Yu · Jae Sung Park · Jize Cao · Ali Farhadi · Yejin Choi -
2021 : NaturalProofs: Mathematical Theorem Proving in Natural Language »
Sean Welleck · Jiacheng Liu · Ronan Le Bras · Hanna Hajishirzi · Yejin Choi · Kyunghyun Cho -
2021 Poster: Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals »
Lang Liu · Krishna Pillutla · Sean Welleck · Sewoong Oh · Yejin Choi · Zaid Harchaoui -
2021 Poster: SalKG: Learning From Knowledge Graph Explanations for Commonsense Reasoning »
Aaron Chan · Jiashu Xu · Boyuan Long · Soumya Sanyal · Tanishq Gupta · Xiang Ren -
2021 Poster: MERLOT: Multimodal Neural Script Knowledge Models »
Rowan Zellers · Ximing Lu · Jack Hessel · Youngjae Yu · Jae Sung Park · Jize Cao · Ali Farhadi · Yejin Choi -
2021 Poster: Gradient-based Editing of Memory Examples for Online Task-free Continual Learning »
Xisen Jin · Arka Sadhu · Junyi Du · Xiang Ren -
2021 Poster: Refining Language Models with Compositional Explanations »
Huihan Yao · Ying Chen · Qinyuan Ye · Xisen Jin · Xiang Ren -
2021 Poster: MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers »
Krishna Pillutla · Swabha Swayamdipta · Rowan Zellers · John Thickstun · Sean Welleck · Yejin Choi · Zaid Harchaoui -
2021 : CommonsenseQA 2.0: Exposing the Limits of AI through Gamification »
Alon Talmor · Ori Yoran · Ronan Le Bras · Chandra Bhagavatula · Yoav Goldberg · Yejin Choi · Jonathan Berant -
2021 Oral: MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers »
Krishna Pillutla · Swabha Swayamdipta · Rowan Zellers · John Thickstun · Sean Welleck · Yejin Choi · Zaid Harchaoui -
2020 : Panel Discussion & Closing »
Yejin Choi · Alexei Efros · Chelsea Finn · Kristen Grauman · Quoc V Le · Yann LeCun · Ruslan Salakhutdinov · Eric Xing -
2020 : QA: Yejin Choi »
Yejin Choi -
2020 : Invited Talk: Yejin Choi »
Yejin Choi -
2020 : Adversarial, Socially Aware, and Commonsensical Data »
Yejin Choi -
2019 : Invited Talk (Yejin Choi) »
Yejin Choi -
2019 : Poster Session »
Nathalie Baracaldo · Seth Neel · Tuyen Le · Dan Philps · Suheng Tao · Sotirios Chatzis · Toyo Suzumura · Wei Wang · WENHANG BAO · Solon Barocas · Manish Raghavan · Samuel Maina · Reginald Bryant · Kush Varshney · Skyler D. Speakman · Navdeep Gill · Nicholas Schmidt · Kevin Compher · Naveen Sundar Govindarajulu · Vivek Sharma · Praneeth Vepakomma · Tristan Swedish · Jayashree Kalpathy-Cramer · Ramesh Raskar · Shihao Zheng · Mykola Pechenizkiy · Marco Schreyer · Li Ling · Chirag Nagpal · Robert Tillman · Manuela Veloso · Hanjie Chen · Xintong Wang · Michael Wellman · Matthew van Adelsberg · Ben Wood · Hans Buehler · Mahmoud Mahfouz · Antonios Alexos · Megan Shearer · Antigoni Polychroniadou · Lucia Larise Stavarache · Dmitry Efimov · Johnston P Hall · Yukun Zhang · Emily Diana · Sumitra Ganesh · Vineeth Ravi · · Swetasudha Panda · Xavier Renard · Matthew Jagielski · Yonadav Shavit · Joshua Williams · Haoran Wei · Shuang (Sophie) Zhai · Xinyi Li · Hongda Shen · Daiki Matsunaga · Jaesik Choi · Alexis Laignelet · Batuhan Guler · Jacobo Roa Vicens · Ajit Desai · Jonathan Aigrain · Robert Samoilescu -
2019 : Yejin Choi »
Yejin Choi -
2019 Poster: Defending Against Neural Fake News »
Rowan Zellers · Ari Holtzman · Hannah Rashkin · Yonatan Bisk · Ali Farhadi · Franziska Roesner · Yejin Choi -
2018 Poster: Hierarchical Graph Representation Learning with Differentiable Pooling »
Zhitao Ying · Jiaxuan You · Christopher Morris · Xiang Ren · Will Hamilton · Jure Leskovec -
2018 Spotlight: Hierarchical Graph Representation Learning with Differentiable Pooling »
Zhitao Ying · Jiaxuan You · Christopher Morris · Xiang Ren · Will Hamilton · Jure Leskovec