Deep learning has driven dramatic performance advances on numerous difficult machine learning tasks in a wide range of applications. Yet, its theoretical foundations remain poorly understood, with many more questions than answers. For example: What are the modeling assumptions underlying deep networks? How well can we expect deep networks to perform? When a certain network succeeds or fails, can we determine why and how? How can we adapt deep learning to new domains in a principled way?
While some progress has been made recently towards a foundational understanding of deep learning, most theory work has been disjointed, and a coherent picture has yet to emerge. Indeed, the current state of deep learning theory is like the fable “The Blind Men and the Elephant”.
The goal of this workshop is to provide a forum where theoretical researchers of all stripes can come together not only to share reports on their individual progress but also to find new ways to join forces towards the goal of a coherent theory of deep learning. Topics to be discussed include:
- Statistical guarantees for deep learning models
- Expressive power and capacity of neural networks
- New probabilistic models from which various deep architectures can be derived
- Optimization landscapes of deep networks
- Deep representations and invariance to latent factors
- Tensor analysis of deep learning
- Deep learning from an approximation theory perspective
- Sparse coding and deep learning
- Mixture models, the EM algorithm, and deep learning
In addition to invited and contributed talks by leading researchers from diverse backgrounds, the workshop will feature an extended poster/discussion session and panel discussion on which combinations of ideas are most likely to move theory of deep learning forward and which might lead to blind alleys.
Accepted Papers and Authors
1. A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks. Sanjeev Arora, Nadav Cohen, Noah Golowich and Wei Hu.
2. On the convergence of SGD on neural nets and other over-parameterized problems. Karthik Abinav Sankararaman, Soham De, Zheng Xu, W. Ronny Huang and Tom Goldstein.
3. Optimal SGD Hyperparameters for Fully Connected Networks. Daniel Park, Samuel Smith, Jascha Sohl-Dickstein and Quoc Le.
4. Invariant representation learning for robust deep networks. Julian Salazar, Davis Liang, Zhiheng Huang and Zachary Lipton.
5. Characterizing & Exploring Deep CNN Representations Using Factorization. Uday Singh Saini and Evangelos Papalexakis.
6. On the Weak Neural Dependence Phenomenon in Deep Learning. Jiayao Zhang, Ruoxi Jia, Bo Li and Dawn Song.
7. DNN or k-NN: That is the Generalize vs. Memorize Question. Gilad Cohen, Guillermo Sapiro and Raja Giryes.
8. On the Margin Theory of Feedforward Neural Networks. Colin Wei, Jason Lee, Qiang Liu and Tengyu Ma.
9. A Differential Topological View of Challenges in Learning with Deep Neural Networks. Hao Shen.
10. Theoretical Analysis of Auto Rate-tuning by Batch Normalization. Sanjeev Arora, Zhiyuan Li and Kaifeng Lyu.
11. Topological Constraints onHomeomorphic Auto-Encoding. Pim de Haan and Luca Falorsi.
12. Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience. Vaishnavh Nagarajan and J. Zico Kolter.
13. Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep Learning. Cheolhyoung Lee, Kyunghyun Cho and Wanmo Kang.
14. Multi-dimensional Count Sketch: Dimension Reduction That Retains Efficient Tensor Operations. Yang Shi and Anima Anandkumar.
15. Gradient Descent Provably Optimizes Over-parameterized Neural Networks. Simon Du, Xiyu Zhai, Aarti Singh and Barnabas Poczos.
16. The Dynamic Distance Between Learning Tasks. Alessandro Achille, Glen Bigan Mbeng and Stefano Soatto.
17. Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization. Navid Azizan and Babak Hassibi.
18. Shared Representation Across Neural Networks. Qihong Lu, Po-Hsuan Chen, Jonathan Pillow, Peter Ramadge, Kenneth Norman and Uri Hasson.
19. Learning in gated neural networks. Ashok Makkuva, Sewoong Oh, Sreeram Kannan and Pramod Viswanath.
20. Gradient descent aligns the layers of deep linear networks. Ziwei Ji and Matus Telgarsky.
21. Fluctuation-dissipation relation for stochastic gradient descent. Sho Yaida.
22. Identifying Generalization Properties in Neural Networks. Huan Wang, Nitish Shirish Keskar, Caiming Xiong and Richard Socher.
23. A Theoretical Framework for Deep and Locally Connected ReLU Network. Yuandong Tian.
24. Minimum norm solutions do not always generalize well for over-parameterized problems. Vatsal Shah, Anastasios Kyrillidis and Sujay Sanghavi.
25. An Empirical Exploration of Gradient Correlations in Deep Learning. Daniel Rothchild, Roy Fox, Noah Golmant, Joseph Gonzalez, Michael Mahoney, Kai Rothauge, Ion Stoica and Zhewei Yao.
26. Geometric Scattering on Manifolds. Michael Perlmutter, Guy Wolf and Matthew Hirn.
27. Theoretical Insights into Memorization in GANs. Vaishnavh Nagarajan, Colin Raffel and Ian Goodfellow.
28. A jamming transition from under- to over-parametrization affects loss landscape and generalization. Stefano Spigler, Mario Geiger, Stéphane d'Ascoli, Levent Sagun, Giulio Biroli and Matthieu Wyart.
29. A Mean Field Theory of Multi-Layer RNNs. David Anderson, Jeffrey Pennington and Satyen Kale.
30. Generalization and regularization in deep learning for nonlinear inverse problems. Christopher Wong, Maarten de Hoop and Matti Lassas.
31. On the Spectral Bias of Neural Networks. Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio and Aaron Courville.
32. On Generalization Bounds for a Family of Recurrent Neural Networks. Minshuo Chen, Xingguo Li and Tuo Zhao.
33. SGD Implicitly Regularizes Generalization Error. Dan Roberts.
34. Iteratively Learning from the Best. Yanyao Shen and Sujay Sanghavi.
35. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks. Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun and Nathan Srebro.
36. An Escape-Time Analysis of SGD. Philippe Casgrain, Mufan Li, Gintare Karolina Dziugaite and Daniel Roy.
37. Information Regularized Neural Networks. Tianchen Zhao, Dejiao Zhang, Zeyu Sun and Honglak Lee.
38. Generalization Bounds for Unsupervised Cross-Domain Mapping with WGANs. Tomer Galanti, Sagie Benaim and Lior Wolf.
39. Degeneracy, Trainability, and Generalization in Deep Neural Networks. Emin Orhan and Xaq Pitkow.
40. A Max-Affine Spline View of Deep Network Nonlinearities. Randall Balestriero and Richard Baraniuk.
|Opening Remarks (Remarks)|
|Contributed Talk 1 (Contributed Talk)|
|Contributed Talk 2 (Contributed Talk)|
|Plenary Talk 1 (Plenary Talk)|
|Invited Talk 1 (Invited Talk)|
|Coffee Break (Break)|
|Plenary Talk 2 (Plenary Talk)|
|Invited Talk 2 (Invited Talk)|
|Lunch Break (Break)|
|Plenary Talk 3 (Plenary Talk)|
|Invited Talk 3 (Invited Talk)|
|Contributed Talk 3 (Contributed Talk)|
|Plenary Talk 4 (Plenary Talk)|
|Invited Talk 4 (Invited Talk)|
|Closing Remarks (Remarks)|