Contributed talk
in
Workshop: Challenges in Deploying and Monitoring Machine Learning Systems

Post-Training Neural Network Compression With Variational Bayesian Quantization

Zipei Tan ⋅ Robert Bamler

Abstract

Neural network compression can open up new deployment schemes for deep learning models by making it feasible to ship deep neural networks with millions of parameters directly within a mobile or web app rather than running them on a remote data center, thus reducing server costs, network usage, latency, and privacy concerns. In this paper, we propose and empirically evaluate a simple and generic compression method for trained neural networks that builds on variational inference and on the Variational Bayesian Quantization algorithm [Yang et al., 2020]. We find that the proposed method achieves significantly lower bit rates than existing post-training compression methods at comparable model performance. The proposed method demonstrates a new use case of Bayesian neural networks (BNNs), and we analyze how compression performance depends on the temperature of a BNN.

Video

Chat is not available.