Skip to yearly menu bar Skip to main content

Workshop: Challenges in Deploying and Monitoring Machine Learning Systems

Post-Training Neural Network Compression With Variational Bayesian Quantization

Zipei Tan · Robert Bamler


Neural network compression can open up new deployment schemes for deep learning models by making it feasible to ship deep neural networks with millions of parameters directly within a mobile or web app rather than running them on a remote data center, thus reducing server costs, network usage, latency, and privacy concerns. In this paper, we propose and empirically evaluate a simple and generic compression method for trained neural networks that builds on variational inference and on the Variational Bayesian Quantization algorithm [Yang et al., 2020]. We find that the proposed method achieves significantly lower bit rates than existing post-training compression methods at comparable model performance. The proposed method demonstrates a new use case of Bayesian neural networks (BNNs), and we analyze how compression performance depends on the temperature of a BNN.

Chat is not available.