Timezone: »

Efficient Computation of Deep Convolutional Neural Networks: A Quantization Perspective
Max Welling

Abstract: neural network compression has become an important research area due to its great impact on deployment of large models on resource constrained devices. In this talk, we will introduce two novel techniques that allow for differentiable sparsification and quantization of deep neural networks; both of these are achieved via appropriate smoothing of the overall objective. As a result, we can directly train architectures to be highly compressed and hardware-friendly via off-the-self stochastic gradient descent optimizers.

Author Information

Max Welling (University of Amsterdam / Qualcomm AI Research)

More from the Same Authors