Poster
in
Workshop: The First Workshop on Efficient Reasoning

LayerMerge: Modality-Agnostic Depth Pruning for Efficient Foundation Model Deployment

Arjun Choudhry · Chang Liu · Nina Żukowska · Yifu Cai · Mononito Goswami · Artur Dubrawski

Project Page [ OpenReview]

Abstract

Large foundation models face deployment challenges in resource-constrained environments. While width pruning typically outperforms depth pruning, we introduce LayerMerge, a simple modality-agnostic depth pruning technique that closes the performance gap with width pruning while providing linear reductions in inference time and memory. Extensive benchmarks show LayerMerge preserves emergent abilities under aggressive compression, maintaining most of the original performance while reducing model depth by up to 90%.

Chat is not available.