Timezone: »
This paper identifies a structural property of data distributions that enables deep neural networks to learn hierarchically. We define the ``staircase'' property for functions over the Boolean hypercube, which posits that high-order Fourier coefficients are reachable from lower-order Fourier coefficients along increasing chains. We prove that functions satisfying this property can be learned in polynomial time using layerwise stochastic coordinate descent on regular neural networks -- a class of network architectures and initializations that have homogeneity properties. Our analysis shows that for such staircase functions and neural networks, the gradient-based algorithm learns high-level features by greedily combining lower-level features along the depth of the network. We further back our theoretical results with experiments showing that staircase functions are learnable by more standard ResNet architectures with stochastic gradient descent. Both the theoretical and experimental results support the fact that the staircase property has a role to play in understanding the capabilities of gradient-based learning on regular networks, in contrast to general polynomial-size networks that can emulate any Statistical Query or PAC algorithm, as recently shown.
Author Information
Emmanuel Abbe (Swiss Federal Institute of Technology Lausanne)
Enric Boix-Adsera (MIT)
Matthew S Brennan (Massachusetts Institute of Technology)
Guy Bresler (MIT)
Dheeraj Nagaraj (Massachusetts Institute of Technology)
More from the Same Authors
-
2021 Spotlight: Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems »
Suhas Kowshik · Dheeraj Nagaraj · Prateek Jain · Praneeth Netrapalli -
2021 Spotlight: On the Power of Differentiable Learning versus PAC and SQ Learning »
Emmanuel Abbe · Pritish Kamath · Eran Malach · Colin Sandon · Nathan Srebro -
2022 Poster: GULP: a prediction-based metric between representations »
Enric Boix-Adsera · Hannah Lawrence · George Stepaniants · Philippe Rigollet -
2022 Poster: On the non-universality of deep learning: quantifying the cost of symmetry »
Emmanuel Abbe · Enric Boix-Adsera -
2022 Poster: Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures »
Emmanuel Abbe · Samy Bengio · Elisabetta Cornacchia · Jon Kleinberg · Aryo Lotfi · Maithra Raghu · Chiyuan Zhang -
2021 Poster: On the Power of Differentiable Learning versus PAC and SQ Learning »
Emmanuel Abbe · Pritish Kamath · Eran Malach · Colin Sandon · Nathan Srebro -
2021 Poster: Streaming Linear System Identification with Reverse Experience Replay »
Suhas Kowshik · Dheeraj Nagaraj · Prateek Jain · Praneeth Netrapalli -
2021 Poster: Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems »
Suhas Kowshik · Dheeraj Nagaraj · Prateek Jain · Praneeth Netrapalli -
2020 Poster: Sharp Representation Theorems for ReLU Networks with Precise Dependence on Depth »
Guy Bresler · Dheeraj Nagaraj -
2020 Poster: Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms »
Dheeraj Nagaraj · Xian Wu · Guy Bresler · Prateek Jain · Praneeth Netrapalli -
2020 Spotlight: Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms »
Dheeraj Nagaraj · Xian Wu · Guy Bresler · Prateek Jain · Praneeth Netrapalli -
2020 Poster: Learning Restricted Boltzmann Machines with Sparse Latent Variables »
Guy Bresler · Rares-Darius Buhai -
2019 Poster: Sample Efficient Active Learning of Causal Trees »
Kristjan Greenewald · Dmitriy Katz · Karthikeyan Shanmugam · Sara Magliacane · Murat Kocaoglu · Enric Boix-Adsera · Guy Bresler -
2018 Poster: Sparse PCA from Sparse Linear Regression »
Guy Bresler · Sung Min Park · Madalina Persu -
2017 : Community Detection and Invariance to Distribution »
Guy Bresler