COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
Eugene Kwek · Wenpeng Yin
Abstract
This work presents COMPACT, a training-free, deployment-friendly pruning method that jointly removes rare vocabulary and prunes FFN intermediate channels using common-token–weighted activations. COMPACT is both scale-aware and structure-agnostic across Qwen and LLaMA families. In experiments spanning 0.5B–70B parameters, COMPACT delivers strong downstream performance with 35% fewer parameters, competitive pruning times, and substantial memory savings.
Chat is not available.
Successful Page Load