NeurIPS Poster Cluster-Learngene: Inheriting Adaptive Clusters for Vision Transformers

Poster

Cluster-Learngene: Inheriting Adaptive Clusters for Vision Transformers

Qiufeng Wang · Xu Yang · Fu Feng · Jingq Wang · Xin Geng

East Exhibit Hall A-C #2005

[ Abstract ] [ Project Page ]

[ Paper] [ Poster] [ OpenReview]

Abstract:

In recent years, the merging of vast datasets with powerful computational resources has led to the emergence of large pre-trained models in the field of deep learning. However, the common practices often overgeneralize the applicability of these models, overlooking the task-specific resource constraints. To mitigate this issue, we propose \textbf{Cluster-Learngene}, which effectively clusters critical internal modules from a large ancestry model and then inherits them to initialize descendant models of elastic scales. Specifically, based on the density characteristics of attention heads, our method adaptively clusters attention heads of each layer and position-wise feed-forward networks (FFNs) in the ancestry model as the learngene. Moreover, we introduce priority weight-sharing and learnable parameter transformations that expand the learngene to initialize descendant models of elastic scales. Through extensive experimentation, we demonstrate that Cluster-Learngene not only is more efficient compared to other initialization methods but also customizes models of elastic scales according to downstream task resources.

Chat is not available.