NeurIPS Poster MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations

Poster

MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations

Ruosen Li · Zimu Wang · Son Tran · Lei Xia · Xinya Du

West Ballroom A-D #5409

[ Abstract ] [ Project Page ]

[ Paper] [ Slides] [ Poster]

Abstract:

Existing benchmarks for multi-hop question answering (QA) primarily evaluate models based on their ability to reason about entities and the relationships between them. However, there's a lack of insight into how these models perform in terms of both events and entities. In this paper, we introduce a novel semi-automatic question generation strategy by composing event structures from information extraction (IE) datasets and present the first Multi-hop Event-centric Question Answering (MEQA) benchmark. It contains (1) 2,243 challenging questions that require a diverse range of complex reasoning over entity-entity, entity-event, and event-event relations; (2) corresponding multi-step QA-format event reasoning chain (explanation) which leads to the answer for each question. We also introduce two metrics for evaluating explanations: completeness and logical consistency. We conduct comprehensive benchmarking and analysis, which shows that MEQA is challenging for the latest state-of-the-art models encompassing large language models (LLMs); and how they fall short of providing faithful explanations of the event-centric reasoning process.

Chat is not available.