Skip to yearly menu bar Skip to main content


In-person presentation
in
Competition: NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day

Invited Speaker: Leshem Choshen (IBM Research) - Efficient Evaluation for Efficient Training

Leshem Choshen


Abstract:

Two competing forces are ignored in evaluation: reliability and efficiency. The talk would explain the basics of the open evaluation (HELM) and the analysis done to change an awfully slow evaluation (4K GPU hours for a single model), to hundreds times faster evaluation that you can still rely on its scores. In short, maximize the variability of the data (more datasets prompts, less examples and repetitions) and give more resources to cases you care about (e.g., remove the bottom models after a fast evaluation). The analysis and more details on how to make-check smart evaluation decisions: "efficient benchmarking (of language models)"

https://arxiv.org/abs/2308.11696v3

Chat is not available.