Timezone: »
We consider the training of structured neural networks where the regularizer can be non-smooth and possibly non-convex. While popular machine learning libraries have resorted to stochastic (adaptive) subgradient approaches, the use of proximal gradient methods in the stochastic setting has been little explored and warrants further study, in particular regarding the incorporation of adaptivity. Towards this goal, we present a general framework of stochastic proximal gradient descent methods that allows for arbitrary positive preconditioners and lower semi-continuous regularizers. We derive two important instances of our framework: (i) the first proximal version of \textsc{Adam}, one of the most popular adaptive SGD algorithm, and (ii) a revised version of ProxQuant for quantization-specific regularizers, which improves upon the original approach by incorporating the effect of preconditioners in the proximal mapping computations. We provide convergence guarantees for our framework and show that adaptive gradient methods can have faster convergence in terms of constant than vanilla SGD for sparse data. Lastly, we demonstrate the superiority of stochastic proximal methods compared to subgradient-based approaches via extensive experiments. Interestingly, our results indicate that the benefit of proximal approaches over sub-gradient counterparts is more pronounced for non-convex regularizers than for convex ones.
Author Information
Jihun Yun (Korea Advanced Institute of Science and Technology)
Aurelie Lozano (IBM Research)
Eunho Yang (Korea Advanced Institute of Science and Technology; AItrics)
More from the Same Authors
-
2021 Poster: Unbiased Classification through Bias-Contrastive and Bias-Balanced Learning »
Youngkyu Hong · Eunho Yang -
2020 Poster: Bootstrapping neural processes »
Juho Lee · Yoonho Lee · Jungtaek Kim · Eunho Yang · Sung Ju Hwang · Yee Whye Teh -
2020 Poster: Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning »
Jaehyung Kim · Youngbum Hur · Sejun Park · Eunho Yang · Sung Ju Hwang · Jinwoo Shin -
2020 Poster: Time-Reversal Symmetric ODE Network »
In Huh · Eunho Yang · Sung Ju Hwang · Jinwoo Shin -
2020 Poster: Neural Complexity Measures »
Yoonho Lee · Juho Lee · Sung Ju Hwang · Eunho Yang · Seungjin Choi -
2020 Poster: Few-shot Visual Reasoning with Meta-Analogical Contrastive Learning »
Youngsung Kim · Jinwoo Shin · Eunho Yang · Sung Ju Hwang -
2020 Poster: Attribution Preservation in Network Compression for Reliable Network Interpretation »
Geondo Park · June Yong Yang · Sung Ju Hwang · Eunho Yang -
2018 Poster: Uncertainty-Aware Attention for Reliable Interpretation and Prediction »
Jay Heo · Hae Beom Lee · Saehoon Kim · Juho Lee · Kwang Joon Kim · Eunho Yang · Sung Ju Hwang -
2018 Poster: Joint Active Feature Acquisition and Classification with Variable-Size Set Encoding »
Hajin Shim · Sung Ju Hwang · Eunho Yang -
2018 Poster: DropMax: Adaptive Variational Softmax »
Hae Beom Lee · Juho Lee · Saehoon Kim · Eunho Yang · Sung Ju Hwang -
2015 Poster: Closed-form Estimators for High-dimensional Generalized Linear Models »
Eunho Yang · Aurelie Lozano · Pradeep Ravikumar -
2015 Spotlight: Closed-form Estimators for High-dimensional Generalized Linear Models »
Eunho Yang · Aurelie Lozano · Pradeep Ravikumar -
2015 Poster: Robust Gaussian Graphical Modeling with the Trimmed Graphical Lasso »
Eunho Yang · Aurelie Lozano -
2014 Workshop: Out of the Box: Robustness in High Dimension »
Aurelie Lozano · Aleksandr Y Aravkin · Stephen Becker -
2014 Session: Oral Session 10 »
Aurelie Lozano -
2014 Poster: Elementary Estimators for Graphical Models »
Eunho Yang · Aurelie Lozano · Pradeep Ravikumar -
2013 Poster: Conditional Random Fields via Univariate Exponential Families »
Eunho Yang · Pradeep Ravikumar · Genevera I Allen · Zhandong Liu -
2013 Poster: On Poisson Graphical Models »
Eunho Yang · Pradeep Ravikumar · Genevera I Allen · Zhandong Liu -
2013 Poster: Dirty Statistical Models »
Eunho Yang · Pradeep Ravikumar -
2012 Poster: Graphical Models via Generalized Linear Models »
Eunho Yang · Pradeep Ravikumar · Genevera I Allen · zhandong Liu -
2012 Oral: Graphical Models via Generalized Linear Models »
Eunho Yang · Pradeep Ravikumar · Genevera I Allen · zhandong Liu -
2011 Poster: Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels »
Vikas Sindhwani · Aurelie Lozano -
2010 Workshop: Practical Application of Sparse Modeling: Open Issues and New Directions »
Irina Rish · Alexandru Niculescu-Mizil · Guillermo Cecchi · Aurelie Lozano -
2010 Poster: Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference »
Aurelie Lozano · Vikas Sindhwani -
2009 Poster: Grouped Orthogonal Matching Pursuit for Variable Selection and Prediction »
Aurelie Lozano · Grzegorz M Swirszcz · Naoki Abe