Skip to yearly menu bar Skip to main content


Progressive Ensemble Distillation: Building Ensembles for Efficient Inference

Don Dennis · Abhishek Shetty · Anish Prasad Sevekari · Kazuhito Koishida · Virginia Smith

Great Hall & Hall B1+B2 (level 1) #2013
[ ]
[ Paper [ Poster [ OpenReview
Thu 14 Dec 3 p.m. PST — 5 p.m. PST


Knowledge distillation is commonly used to compress an ensemble of models into a single model. In this work we study the problem of progressive ensemble distillation: Given a large, pretrained teacher model , we seek to decompose the model into an ensemble of smaller, low-inference cost student models . The resulting ensemble allows for flexibly tuning accuracy vs. inference cost, which can be useful for a multitude of applications in efficient inference. Our method, B-DISTIL, uses a boosting procedure that allows function composition based aggregation rules to construct expressive ensembles with similar performance as using much smaller student models. We demonstrate the effectiveness of B-DISTIL by decomposing pretrained models across a variety of image, speech, and sensor datasets. Our method comes with strong theoretical guarantees in terms of convergence as well as generalization.

Chat is not available.