Tutorial
Evaluating Large Language Models - Principles, Approaches, and Applications
Bo Li · Irina Sigler · Yuan Xue
West Exhibition Hall C
This tutorial delves into the critical and complex domain of evaluating large language models (LLMs), focusing on the unique challenges presented when assessing generative outputs. Despite the difficulty in assigning precise quality scores to such outputs, our tutorial emphasizes the necessity of rigorous evaluation throughout the development process of LLMs. This tutorial will provide an extensive presentation of evaluation scopes, from task-specific metrics to broader performance indicators such as safety and fairness. Participants will be introduced to a range of methodological approaches, including both computation and model-based assessments. The session includes hands-on coding demonstrations, providing the tools and knowledge needed to refine model selection, prompt engineering, and inference configurations. By the end of this tutorial, attendees will gain a comprehensive understanding of LLM evaluation frameworks, contributing to more informed decision-making and ensuring the responsible deployment of these models in real-world applications.
Live content is unavailable. Log in and register to view live content