Competition

LLM Merging: Building LLMs Efficiently through Merging

Margaret Li · Jiacheng Zhu · Rickard Brüel Gabrielsson · Derek Tam · Mikhail Yurochkin · Colin Raffel · Leshem Choshen

2024 Competition

[ OpenReview]

Abstract

Training high-performing large language models (LLMs) from scratch is a notoriously expensive and difficult task, costing hundreds of millions of dollars in compute alone. These pretrained LLMs, however, can cheaply and easily be adapted to new tasks via fine-tuning, leading to a proliferation of models that suit specific use cases. Recent work has shown that specialized fine-tuned models can be rapidly merged to combine capabilities and generalize to new skills. This raises the question: given a new suite of desired skills and design parameters, is it necessary to fine-tune or train yet another LLM from scratch, or can similar existing models be re-purposed for a new task with the right selection or merging procedure? The LLM Merging challenge aims to spur the development and evaluation of methods for merging and reusing existing models to form stronger new models without needing additional training. Specifically, the competition focuses on merging existing publicly-released expert models from Hugging Face, using only minimal compute and additional parameters. The goal will be to develop merged models that outperform existing models and existing merging baselines. Submissions will be judged based on the average accuracy on a set of held-out multiple-choice evaluation tasks and their efficiency. To make the competition as accessible as possible and ensure that the merging procedures are more efficient than fine-tuning, we will enforce a compute budget and focus on merging models with fewer than 8B parameters. A starter kit with all necessary materials (baseline implementations, requirements, the evaluation script, etc.) will be released on May 1st.

Video

Chat is not available.

Schedule

Timezone: America/Los_Angeles

9:05 AM

Welcome

Margaret Li

Video

9:10 AM

Writeup Winners talk

Siddharth Gupta

Video

9:20 AM

Efficiency Winners talk

yang ding

Video

9:30 AM

Invited Talk: Modular Deep Learning (Jonas Pfeiffer)

Jonas Pfeiffer

Video

10:15 AM

Break

Margaret Li

10:30 AM

3rd place talk

Zixiang Di

Video

10:40 AM

2nd place talk

Yinuo Zhang

Video

10:50 AM

1st place talk

Jisheng Fang

Video

11:00 AM

Invited Talk: Decoding-time experts for language model adaptation (Alisa Liu)

Alisa Liu

Video

11:45 AM

Closing remarks

Margaret Li

Video