LayerMerge: Modality-Agnostic Depth Pruning for Efficient Foundation Model Deployment
Arjun Choudhry · Chang Liu · Nina Żukowska · Yifu Cai · Mononito Goswami · Artur Dubrawski
Abstract
Large foundation models face deployment challenges in resource-constrained environments. While width pruning typically outperforms depth pruning, we introduce LayerMerge, a simple modality-agnostic depth pruning technique that closes the performance gap with width pruning while providing linear reductions in inference time and memory. Extensive benchmarks show LayerMerge preserves emergent abilities under aggressive compression, maintaining most of the original performance while reducing model depth by up to 90%.
Chat is not available.
Successful Page Load