Skip to yearly menu bar Skip to main content


Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks

Jinze Zhao ⋅ Junjie Yang ⋅ Peihao Wang ⋅ Yingbin Liang ⋅ Zhangyang "Atlas" Wang

Abstract

Chat is not available.