Timezone: »
Multilayer Neural Networks (MNNs) are commonly trained using gradient descent-based methods, such as BackPropagation (BP). Inference in probabilistic graphical models is often done using variational Bayes methods, such as Expectation Propagation (EP). We show how an EP based approach can also be used to train deterministic MNNs. Specifically, we approximate the posterior of the weights given the data using a “mean-field” factorized distribution, in an online setting. Using online EP and the central limit theorem we find an analytical approximation to the Bayes update of this posterior, as well as the resulting Bayes estimates of the weights and outputs. Despite a different origin, the resulting algorithm, Expectation BackPropagation (EBP), is very similar to BP in form and efficiency. However, it has several additional advantages: (1) Training is parameter-free, given initial conditions (prior) and the MNN architecture. This is useful for large-scale problems, where parameter tuning is a major challenge. (2) The weights can be restricted to have discrete values. This is especially useful for implementing trained MNNs in precision limited hardware chips, thus improving their speed and energy efficiency by several orders of magnitude. We test the EBP algorithm numerically in eight binary text classification tasks. In all tasks, EBP outperforms: (1) standard BP with the optimal constant learning rate (2) previously reported state of the art. Interestingly, EBP-trained MNNs with binary weights usually perform better than MNNs with continuous (real) weights - if we average the MNN output using the inferred posterior.
Author Information
Daniel Soudry (Technion)
I am an assistant professor in the Department of Electrical Engineering at the Technion, working in the areas of Machine learning and theoretical neuroscience. I am especially interested in all aspects of neural networks and deep learning. I did my post-doc (as a Gruss Lipper fellow) working with Prof. Liam Paninski in the Department of Statistics, the Center for Theoretical Neuroscience the Grossman Center for Statistics of the Mind, the Kavli Institute for Brain Science, and the NeuroTechnology Center at Columbia University. I did my Ph.D. (2008-2013, direct track) in the Network Biology Research Laboratory in the Department of Electrical Engineering at the Technion, Israel Institute of technology, under the guidance of Prof. Ron Meir. In 2008 I graduated summa cum laude with a B.Sc. in Electrical Engineering and a B.Sc. in Physics, after studying in the Technion since 2004.
Itay Hubara (Habana Labs)
Ron Meir (Technion)
More from the Same Authors
-
2022 Poster: Integral Probability Metrics PAC-Bayes Bounds »
Ron Amit · Baruch Epstein · Shay Moran · Ron Meir -
2021 Poster: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks »
Itay Hubara · Brian Chmiel · Moshe Island · Ron Banner · Joseph Naor · Daniel Soudry -
2021 Poster: The Implicit Bias of Minima Stability: A View from Function Space »
Rotem Mulayoff · Tomer Michaeli · Daniel Soudry -
2021 Poster: Physics-Aware Downsampling with Deep Learning for Scalable Flood Modeling »
Niv Giladi · Zvika Ben-Haim · Sella Nevo · Yossi Matias · Daniel Soudry -
2021 Poster: A Theory of the Distortion-Perception Tradeoff in Wasserstein Space »
Dror Freirich · Tomer Michaeli · Ron Meir -
2020 Poster: Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy »
Edward Moroshko · Blake Woodworth · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2020 Spotlight: Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy »
Edward Moroshko · Blake Woodworth · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Keun Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2019 Poster: A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off »
Yaniv Blumenfeld · Dar Gilboa · Daniel Soudry -
2019 Poster: Post training 4-bit quantization of convolutional networks for rapid-deployment »
Ron Banner · Yury Nahshan · Daniel Soudry -
2018 Poster: Norm matters: efficient and accurate normalization schemes in deep networks »
Elad Hoffer · Ron Banner · Itay Golan · Daniel Soudry -
2018 Spotlight: Norm matters: efficient and accurate normalization schemes in deep networks »
Elad Hoffer · Ron Banner · Itay Golan · Daniel Soudry -
2018 Poster: Implicit Bias of Gradient Descent on Linear Convolutional Networks »
Suriya Gunasekar · Jason Lee · Daniel Soudry · Nati Srebro -
2018 Poster: Scalable methods for 8-bit training of neural networks »
Ron Banner · Itay Hubara · Elad Hoffer · Daniel Soudry -
2017 : Closing the Generalization Gap »
Itay Hubara -
2017 Poster: Train longer, generalize better: closing the generalization gap in large batch training of neural networks »
Elad Hoffer · Itay Hubara · Daniel Soudry -
2017 Oral: Train longer, generalize better: closing the generalization gap in large batch training of neural networks »
Elad Hoffer · Itay Hubara · Daniel Soudry -
2016 Poster: Binarized Neural Networks »
Itay Hubara · Matthieu Courbariaux · Daniel Soudry · Ran El-Yaniv · Yoshua Bengio -
2015 : Spotlight Part II »
Alex Gibberd · Kenji Doya · Bhaswar B Bhattacharya · Sakyasingha Dasgupta · Daniel Soudry -
2015 Poster: A Tractable Approximation to Optimal Point Process Filtering: Application to Neural Encoding »
Yuval Harel · Ron Meir · Manfred Opper -
2015 Spotlight: A Tractable Approximation to Optimal Point Process Filtering: Application to Neural Encoding »
Yuval Harel · Ron Meir · Manfred Opper -
2014 Poster: Optimal Neural Codes for Control and Estimation »
Alex K Susemihl · Ron Meir · Manfred Opper -
2011 Poster: Analytical Results for the Error in Filtering of Gaussian Processes »
Alex K Susemihl · Ron Meir · Manfred Opper -
2008 Poster: Temporal Difference Based Actor Critic Learning - Convergence and Neural Implementation »
Dotan Di Castro · Dima Volkinshtein · Ron Meir -
2007 Oral: A neural network implementing optimal state estimation based on dynamic spike train decoding »
Omer Bobrowski · Ron Meir · Shy Shoham · Yonina Eldar -
2007 Poster: A neural network implementing optimal state estimation based on dynamic spike train decoding »
Omer Bobrowski · Ron Meir · Shy Shoham · Yonina Eldar