Today’s state of deep neural network inference can be summed up with two words: complex and inefficient. The quest for accuracy has led to overparameterized deep neural networks that require heavy compute resources to solve tasks at hand, and as such we are “rapidly approaching outrageous computational, economic, and environmental costs to gain incrementally smaller improvements in model performance (State of AI Report 2020).” Furthermore, there is no lack of research on achieving high levels of unstructured sparsity, but putting that research into practice remains a challenge. As a result, data scientists and machine learning engineers are often forced to make tradeoffs between model performance, accuracy, and inference costs.
There is a better way.
After years of research at MIT, the team at Neural Magic concluded that throwing teraflops at dense models is not sustainable. So we've taken the best of known research on model compression (unstructured pruning and quantization, in particular) and efficient sparse execution to build a software solution that delivers efficient deep neural network inference on everyday CPUs, without the need for specialized hardware.
Join Neural Magic ML experts to learn how we successfully applied published research on model compression and efficient sparse execution to built software that compresses and optimize deep learning models for efficient inference with ease.
You’ll walk away with an overview of: SOTA model compression techniques; A demo of the first-ever general-purpose inference engine that translates high sparsity levels into significant speedup, and Next steps on using the Neural Magic Inference engine and ML tools to make your inference efficient, with less complexity.
Mark J Kurtz (Neural Magic)
Dan Alistarh (IST Austria & Neural Magic Inc.)
Saša Zelenović (Neural Magic)
More from the Same Authors
2020 Poster: Scalable Belief Propagation via Relaxed Scheduling »
Vitalii Aksenov · Dan Alistarh · Janne H. Korhonen
2020 Poster: Adaptive Gradient Quantization for Data-Parallel SGD »
Fartash Faghri · Iman Tabrizian · Ilia Markov · Dan Alistarh · Daniel Roy · Ali Ramezani-Kebrya
2020 Poster: WoodFisher: Efficient Second-Order Approximation for Neural Network Compression »
Sidak Pal Singh · Dan Alistarh
2018 Poster: The Convergence of Sparsified Gradient Methods »
Dan Alistarh · Torsten Hoefler · Mikael Johansson · Nikola Konstantinov · Sarit Khirirat · Cedric Renggli
2018 Poster: Byzantine Stochastic Gradient Descent »
Dan Alistarh · Zeyuan Allen-Zhu · Jerry Li