Skip to yearly menu bar Skip to main content


Continual Pre-training of MoEs: How robust is your router?

Benjamin Thérien ⋅ Charles-Étienne Joseph ⋅ Zain Sarwar ⋅ Ashwinee Panda ⋅ Anirban Das ⋅ Shi-Xiong Zhang ⋅ Stephen Rawls ⋅ Sambit Sahu ⋅ Eugene Belilovsky ⋅ Irina Rish

Abstract

Video

Chat is not available.