NeurIPS Poster Spiking Transformer with Experts Mixture

Poster

Spiking Transformer with Experts Mixture

Zhaokun Zhou · Yijie Lu · Yanhao Jia · Kaiwei Che · Jun Niu · Liwei Huang · Xinyu Shi · Yuesheng Zhu · Guoqi Li · Zhaofei Yu · Li Yuan

West Ballroom A-D #7102

[ Abstract ]

[ Paper] [ OpenReview]

Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

Spiking Neural Networks (SNNs) provide a sparse spike-driven mechanism which is believed to be critical for energy-efficient deep learning. Mixture-of-Experts (MoE), on the other side, aligns with the brain mechanism of distributed and sparse processing, resulting in an efficient way of enhancing model capacity and conditional computation. In this work, we consider how to incorporate SNNs’ spike-driven and MoE’s conditional computation into a unified framework. However, MoE uses softmax to get the dense conditional weights for each expert and TopK to hard-sparsify the network, which does not fit the properties of SNNs. To address this issue, we reformulate MoE in SNNs and introduce the Spiking Experts Mixture Mechanism (SEMM) from the perspective of sparse spiking activation. Both the experts and the router output spiking sequences, and their element-wise operation makes SEMM computation spike-driven and dynamic sparse-conditional. By developing SEMM into Spiking Transformer, the Experts Mixture Spiking Attention (EMSA) and the Experts Mixture Spiking Perceptron (EMSP) are proposed, which performs routing allocation for head-wise and channel-wise spiking experts, respectively. Experiments show that SEMM realizes sparse conditional computation and obtains a stable improvement on neuromorphic and static datasets with approximate computational overhead based on the Spiking Transformer baselines.

Chat is not available.