Poster
in
Workshop: Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling

T-FIX: Text-Based Explanations with Features Interpretable to eXperts

Shreya Havaldar ⋅ Helen Jin ⋅ Chaehyeon Kim ⋅ Anton Xue ⋅ Weiqiu You ⋅ Gary Weissman ⋅ Rajat Deo ⋅ Sameed Khatana ⋅ Helen Qu ⋅ Marco Gatti ⋅ Daniel Hashimoto ⋅ Amin Madani ⋅ Masao Sako ⋅ Bhuvnesh Jain ⋅ Lyle Ungar ⋅ Eric Wong

Project Page [ OpenReview]

Abstract

As LLMs are deployed in knowledge-intensive settings, professionals need confidence that a model’s reasoning matches domain expertise.Current explanation evaluations focus on plausibility or internal faithfulness, often overlooking alignment with expert intuition. We define expert alignment as a key criterion for evaluating explanations and introduce T-FIX, a benchmark designed to evaluate how well LLM explanations align with expert judgment across seven knowledge-intensive fields.

Chat is not available.