This is the public, feature-limited version of the conference webpage. After Registration and login please visit the full version.

Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient

Ankit Pensia, Shashank Rajput, Alliot Nagle, Harit Vishwakarma, Dimitrios Papailiopoulos

Spotlight presentation: Orals & Spotlights Track 08: Deep Learning
on 2020-12-08T08:20:00-08:00 - 2020-12-08T08:30:00-08:00
Poster Session 2 (more posters)
on 2020-12-08T09:00:00-08:00 - 2020-12-08T11:00:00-08:00
Abstract: The strong lottery ticket hypothesis (LTH) postulates that one can approximate any target neural network by only pruning the weights of a sufficiently over-parameterized random network. A recent work by Malach et al. [MYSS20] establishes the first theoretical analysis for the strong LTH: one can provably approximate a neural network of width $d$ and depth $l$, by pruning a random one that is a factor $O(d^4 l^2)$ wider and twice as deep. This polynomial over-parameterization requirement is at odds with recent experimental research that achieves good approximation with networks that are a small factor wider than the target. In this work, we close the gap and offer an exponential improvement to the over-parameterization requirement for the existence of lottery tickets. We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(log(dl))$ wider and twice as deep. Our analysis heavily relies on connecting pruning random ReLU networks to random instances of the Subset Sum problem. We then show that this logarithmic over-parameterization is essentially optimal for constant depth networks. Finally, we verify several of our theoretical insights with experiments.

Preview Video and Chat

To see video, interact with the author and ask questions please use registration and login.