Timezone: »
Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression schemes in parallel by efficiently computing sufficient statistics of a parametric distribution. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on ImageNet in challenging low-cost communication setups. Our adaptive methods are also significantly more robust to the choice of hyperparameters.
Author Information
Fartash Faghri (University of Toronto)
Iman Tabrizian (University of Toronto)
Ilia Markov (IST Austria)
Dan Alistarh (IST Austria & Neural Magic Inc.)
Daniel Roy (Univ of Toronto & Vector)
Ali Ramezani-Kebrya (University of Toronto and Vector Institute)
More from the Same Authors
-
2021 Spotlight: Towards a Unified Information-Theoretic Framework for Generalization »
Mahdi Haghifam · Gintare Karolina Dziugaite · Shay Moran · Dan Roy -
2021 Poster: The future is log-Gaussian: ResNets and their infinite-depth-and-width limit at initialization »
Mufan Li · Mihai Nica · Dan Roy -
2021 Poster: Minimax Optimal Quantile and Semi-Adversarial Regret via Root-Logarithmic Regularizers »
Jeffrey Negrea · Blair Bilodeau · Nicolò Campolongo · Francesco Orabona · Dan Roy -
2021 Poster: Towards a Unified Information-Theoretic Framework for Generalization »
Mahdi Haghifam · Gintare Karolina Dziugaite · Shay Moran · Dan Roy -
2020 Poster: Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel »
Stanislav Fort · Gintare Karolina Dziugaite · Mansheej Paul · Sepideh Kharaghani · Daniel Roy · Surya Ganguli -
2020 Poster: Scalable Belief Propagation via Relaxed Scheduling »
Vitalii Aksenov · Dan Alistarh · Janne H. Korhonen -
2020 Poster: Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms »
Mahdi Haghifam · Jeffrey Negrea · Ashish Khisti · Daniel Roy · Gintare Karolina Dziugaite -
2020 Poster: In search of robust measures of generalization »
Gintare Karolina Dziugaite · Alexandre Drouin · Brady Neal · Nitarshan Rajkumar · Ethan Caballero · Linbo Wang · Ioannis Mitliagkas · Daniel Roy -
2020 Poster: WoodFisher: Efficient Second-Order Approximation for Neural Network Compression »
Sidak Pal Singh · Dan Alistarh -
2020 Expo Demonstration: Using Sparse Quantization for Efficient Inference on Deep Neural Networks »
Mark J Kurtz · Dan Alistarh · Saša Zelenović -
2019 : Lunch break & Poster session »
Breandan Considine · Michael Innes · Du Phan · Dougal Maclaurin · Robin Manhaeve · Alexey Radul · Shashi Gowda · Ekansh Sharma · Eli Sennesh · Maxim Kochurov · Gordon Plotkin · Thomas Wiecki · Navjot Kukreja · Chung-chieh Shan · Matthew Johnson · Dan Belov · Neeraj Pradhan · Wannes Meert · Angelika Kimmig · Luc De Raedt · Brian Patton · Matthew Hoffman · Rif A. Saurous · Daniel Roy · Eli Bingham · Martin Jankowiak · Colin Carroll · Junpeng Lao · Liam Paull · Martin Abadi · Angel Rojas Jimenez · JP Chen -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Keun Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2019 Workshop: Machine Learning with Guarantees »
Ben London · Gintare Karolina Dziugaite · Daniel Roy · Thorsten Joachims · Aleksander Madry · John Shawe-Taylor -
2019 Poster: Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates »
Jeffrey Negrea · Mahdi Haghifam · Gintare Karolina Dziugaite · Ashish Khisti · Daniel Roy -
2019 Poster: Fast-rate PAC-Bayes Generalization Bounds via Shifted Rademacher Processes »
Jun Yang · Shengyang Sun · Daniel Roy -
2018 Poster: The Convergence of Sparsified Gradient Methods »
Dan Alistarh · Torsten Hoefler · Mikael Johansson · Nikola Konstantinov · Sarit Khirirat · Cedric Renggli -
2018 Poster: Byzantine Stochastic Gradient Descent »
Dan Alistarh · Zeyuan Allen-Zhu · Jerry Li -
2018 Poster: Data-dependent PAC-Bayes priors via differential privacy »
Gintare Karolina Dziugaite · Daniel Roy -
2017 : Daniel Roy - Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes »
Daniel Roy -
2016 Poster: Measuring the reliability of MCMC inference with bidirectional Monte Carlo »
Roger Grosse · Siddharth Ancha · Daniel Roy -
2014 Workshop: 3rd NIPS Workshop on Probabilistic Programming »
Daniel Roy · Josh Tenenbaum · Thomas Dietterich · Stuart J Russell · YI WU · Ulrik R Beierholm · Alp Kucukelbir · Zenna Tavares · Yura Perov · Daniel Lee · Brian Ruttenberg · Sameer Singh · Michael Hughes · Marco Gaboardi · Alexey Radul · Vikash Mansinghka · Frank Wood · Sebastian Riedel · Prakash Panangaden -
2014 Poster: Gibbs-type Indian Buffet Processes »
Creighton Heaukulani · Daniel Roy -
2014 Poster: Mondrian Forests: Efficient Online Random Forests »
Balaji Lakshminarayanan · Daniel Roy · Yee Whye Teh -
2013 Session: Session Chair »
Daniel Roy -
2013 Session: Tutorial Session B »
Daniel Roy -
2012 Workshop: Probabilistic Programming: Foundations and Applications (2 day) »
Vikash Mansinghka · Daniel Roy · Noah Goodman -
2012 Workshop: Probabilistic Programming: Foundations and Applications (2 day) »
Vikash Mansinghka · Daniel Roy · Noah Goodman -
2012 Poster: Random function priors for exchangeable graphs and arrays »
James R Lloyd · Daniel Roy · Peter Orbanz · Zoubin Ghahramani -
2011 Poster: Complexity of Inference in Latent Dirichlet Allocation »
David Sontag · Daniel Roy -
2011 Spotlight: Complexity of Inference in Latent Dirichlet Allocation »
David Sontag · Daniel Roy -
2008 Workshop: Probabilistic Programming: Universal Languages, Systems and Applications »
Daniel Roy · John Winn · David A McAllester · Vikash Mansinghka · Josh Tenenbaum -
2008 Oral: The Mondrian Process »
Daniel Roy · Yee Whye Teh -
2008 Poster: The Mondrian Process »
Daniel Roy · Yee Whye Teh -
2007 Poster: Bayesian Agglomerative Clustering with Coalescents »
Yee Whye Teh · Hal Daumé III · Daniel Roy -
2007 Oral: Bayesian Agglomerative Clustering with Coalescents »
Yee Whye Teh · Hal Daumé III · Daniel Roy -
2006 Poster: Learning annotated hierarchies from relational data »
Daniel Roy · Charles Kemp · Vikash Mansinghka · Josh Tenenbaum -
2006 Talk: Learning annotated hierarchies from relational data »
Daniel Roy · Charles Kemp · Vikash Mansinghka · Josh Tenenbaum