Timezone: »
Adaptive stochastic gradient methods such as AdaGrad have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by accumulating past gradients which are used to tune the step size adaptively. In certain situations the full-matrix variant of AdaGrad is expected to attain better performance, however in high dimensions it is computationally impractical. We present Ada-LR and RadaGrad two computationally efficient approximations to full-matrix AdaGrad based on randomized dimensionality reduction. They are able to capture dependencies between features and achieve similar performance to full-matrix AdaGrad but at a much smaller computational cost. We show that the regret of Ada-LR is close to the regret of full-matrix AdaGrad which can have an up-to exponentially smaller dependence on the dimension than the diagonal variant. Empirically, we show that Ada-LR and RadaGrad perform similarly to full-matrix AdaGrad. On the task of training convolutional neural networks as well as recurrent neural networks, RadaGrad achieves faster convergence than diagonal AdaGrad.
Author Information
Gabriel Krummenacher (ETH Zurich)
Brian McWilliams (Disney Research)
Yannic Kilcher (ETH Zurich)
Joachim M Buhmann (ETH Zurich)
Nicolai Meinshausen (ETH Zurich)
More from the Same Authors
-
2022 Poster: Scalable Sensitivity and Uncertainty Analyses for Causal-Effect Estimates of Continuous-Valued Interventions »
Andrew Jesson · Alyson Douglas · Peter Manshausen · Maëlys Solal · Nicolai Meinshausen · Philip Stier · Yarin Gal · Uri Shalit -
2022 Poster: Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs »
Djordje Miladinovic · Kumar Shridhar · Kushal Jain · Max Paulus · Joachim M Buhmann · Carl Allen -
2022 Poster: Learning Long-Term Crop Management Strategies with CyclesGym »
Matteo Turchetta · Luca Corinzia · Scott Sussex · Amanda Burton · Juan Herrera · Ioannis Athanasiadis · Joachim M Buhmann · Andreas Krause -
2020 Poster: Adversarial Training is a Form of Data-dependent Operator Norm Regularization »
Kevin Roth · Yannic Kilcher · Thomas Hofmann -
2020 Spotlight: Adversarial Training is a Form of Data-dependent Operator Norm Regularization »
Kevin Roth · Yannic Kilcher · Thomas Hofmann -
2019 : Break / Poster Session 1 »
Antonia Marcu · Yao-Yuan Yang · Pascale Gourdeau · Chen Zhu · Thodoris Lykouris · Jianfeng Chi · Mark Kozdoba · Arjun Nitin Bhagoji · Xiaoxia Wu · Jay Nandy · Michael T Smith · Bingyang Wen · Yuege Xie · Konstantinos Pitas · Suprosanna Shit · Maksym Andriushchenko · Dingli Yu · Gaël Letarte · Misha Khodak · Hussein Mozannar · Chara Podimata · James Foulds · Yizhen Wang · Huishuai Zhang · Ondrej Kuzelka · Alexander Levine · Nan Lu · Zakaria Mhammedi · Paul Viallard · Diana Cai · Lovedeep Gondara · James Lucas · Yasaman Mahdaviyeh · Aristide Baratin · Rishi Bommasani · Alessandro Barp · Andrew Ilyas · Kaiwen Wu · Jens Behrmann · Omar Rivasplata · Amir Nazemi · Aditi Raghunathan · Will Stephenson · Sahil Singla · Akhil Gupta · YooJung Choi · Yannic Kilcher · Clare Lyle · Edoardo Manino · Andrew Bennett · Zhi Xu · Niladri Chatterji · Emre Barut · Flavien Prost · Rodrigo Toro Icarte · Arno Blaas · Chulhee Yun · Sahin Lale · YiDing Jiang · Tharun Kumar Reddy Medini · Ashkan Rezaei · Alexander Meinke · Stephen Mell · Gary Kazantsev · Shivam Garg · Aradhana Sinha · Vishnu Lokhande · Geovani Rizk · Han Zhao · Aditya Kumar Akash · Jikai Hou · Ali Ghodsi · Matthias Hein · Tyler Sypherd · Yichen Yang · Anastasia Pentina · Pierre Gillot · Antoine Ledent · Guy Gur-Ari · Noah MacAulay · Tianzong Zhang -
2018 : Datasets and Benchmarks for Causal Learning »
Csaba Szepesvari · Isabelle Guyon · Nicolai Meinshausen · David Blei · Elias Bareinboim · Bernhard Schölkopf · Pietro Perona -
2018 : Causality and Distributional Robustness »
Nicolai Meinshausen -
2017 Poster: Efficient and Flexible Inference for Stochastic Systems »
Stefan Bauer · Nico S Gorbach · Djordje Miladinovic · Joachim M Buhmann -
2017 Poster: Non-monotone Continuous DR-submodular Maximization: Structure and Algorithms »
Yatao Bian · Kfir Levy · Andreas Krause · Joachim M Buhmann -
2017 Poster: Scalable Variational Inference for Dynamical Systems »
Nico S Gorbach · Stefan Bauer · Joachim M Buhmann -
2015 Poster: Variance Reduced Stochastic Gradient Descent with Neighbors »
Thomas Hofmann · Aurelien Lucchi · Simon Lacoste-Julien · Brian McWilliams -
2015 Poster: BACKSHIFT: Learning causal cyclic graphs from unknown shift interventions »
Dominik Rothenhäusler · Christina Heinze-Deml · Jonas Peters · Nicolai Meinshausen -
2014 Poster: Fast and Robust Least Squares Estimation in Corrupted Linear Models »
Brian McWilliams · Gabriel Krummenacher · Mario Lucic · Joachim M Buhmann -
2014 Spotlight: Fast and Robust Least Squares Estimation in Corrupted Linear Models »
Brian McWilliams · Gabriel Krummenacher · Mario Lucic · Joachim M Buhmann -
2013 Poster: Correlated random features for fast semi-supervised learning »
Brian McWilliams · David Balduzzi · Joachim M Buhmann -
2011 Workshop: Philosophy and Machine Learning »
Marcello Pelillo · Joachim M Buhmann · Tiberio Caetano · Bernhard Schölkopf · Larry Wasserman -
2006 Poster: Denoising and Dimension Reduction in Feature Space »
Mikio L Braun · Joachim M Buhmann · Klaus-Robert Müller