Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III): Towards the Future of Large Language Models and their Emerging Descendants

SortedNet, a Place for Every Network and Every Network in its Place

Mojtaba Valipour · Mehdi Rezaghoizadeh · Hossein Rajabzadeh · Marzieh Tahaei · Boxing Chen · Ali Ghodsi


Abstract:

As the size of deep learning models continues to grow, finding optimal models under memory and computation constraints becomes increasingly more important. Although the architecture and constituent building blocks of neural networks usually allow them to be used modularly (i.e., using the sub-networks of a given network after training), their training process is unaware of this modularity. Consequently, conventional neural network training lacks the flexibility to adapt the computational load of the model during inference. This paper proposes SortedNet, a generalized and scalable solution to harness the inherent modularity of deep neural networks across various dimensions (e.g. width, depth, blocks) for efficient dynamic inference. Our training considers a nested architecture for the sub-models with shared parameters and trains all models simultaneously to obtain many-in-one sorted models. We utilize a novel updating scheme during training that combines a random sub-model sampling with gradient accumulation to improve training efficiency. Furthermore, the sorted nature of our training leads to a search-free sub-model selection at inference time; and the nested architecture of the resulting sub-models leads to minimal storage requirement and efficient switching between sub-models at inference. Our general dynamic training approach is demonstrated across various architectures and tasks, including BERT on language understanding and ResNet on image classification. Experimental results show the efficacy of the proposed method in achieving efficient sub-models while outperforming state-of-the-art dynamic training approaches.

Chat is not available.