Poster
in
Workshop: The Fourth Workshop on Efficient Natural Language and Speech Processing (ENLSP-IV): Highlighting New Architectures for Future Foundation Models

StructMoE : Structured Mixture of Experts Using Low Rank Experts

Zain Sarwar ⋅ Ashwinee Panda ⋅ Benjamin Thérien ⋅ Stephen Rawls ⋅ Anirban Das ⋅ Kartik Balasubramaniam ⋅ Berkcan Kapusuzoglu ⋅ Shixiong Zhang ⋅ Sambit Sahu ⋅ MILIND NAPHADE ⋅ Supriyo Chakraborty

Keywords: Efficient Architectures

Abstract

We introduce StructMoE, a method to scale MoE architectures by augmenting experts with dynamic capacity using structured matrices we call Low Rank Experts (LoRE). These LoREs are selected on a per-expert and per-token basis using a secondary router specific to every expert and are entangled with the main expert in the up-projection phase of the expert before the activation function. Empirically, we find this approach to outperform an MoE baseline in terms of loss on a held out validation set.

Video

Chat is not available.