Timezone: »
Poster
The Lingering of Gradients: How to Reuse Gradients Over Time
Zeyuan Allen-Zhu · David Simchi-Levi · Xinshang Wang
Classically, the time complexity of a first-order method is estimated by its number of gradient computations. In this paper, we study a more refined complexity by taking into account the ``lingering'' of gradients: once a gradient is computed at $x_k$, the additional time to compute gradients at $x_{k+1},x_{k+2},\dots$ may be reduced.
We show how this improves the running time of gradient descent and SVRG. For instance, if the "additional time'' scales linearly with respect to the traveled distance, then the "convergence rate'' of gradient descent can be improved from $1/T$ to $\exp(-T^{1/3})$. On the empirical side, we solve a hypothetical revenue management problem on the Yahoo! Front Page Today Module application with 4.6m users to $10^{-6}$ error (or $10^{-12}$ dual error) using 6 passes of the dataset.
Author Information
Zeyuan Allen-Zhu (Microsoft Research)
David Simchi-Levi (MIT)
Xinshang Wang (MIT)
More from the Same Authors
-
2021 : Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation »
Dylan Foster · Akshay Krishnamurthy · David Simchi-Levi · Yunzong Xu -
2023 Poster: Non-stationary Experimental Design under Linear Trends »
David Simchi-Levi · Chonghuan Wang · Zeyu Zheng -
2023 Poster: Stochastic Multi-armed Bandits: Optimal Trade-off among Optimality, Consistency, and Tail Risk »
Feng Zhu · Zeyu Zheng · David Simchi-Levi -
2022 Poster: A Simple and Optimal Policy Design for Online Learning with Safety against Heavy-tailed Risk »
David Simchi-Levi · Zeyu Zheng · Feng Zhu -
2022 Poster: Learning Mixed Multinomial Logits with Provable Guarantees »
Yiqun Hu · David Simchi-Levi · Zhenzhen Yan -
2022 Poster: Context-Based Dynamic Pricing with Partially Linear Demand Model »
Jinzhi Bu · David Simchi-Levi · Chonghuan Wang -
2021 : Contributed Talk 3: Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation »
Yunzong Xu · Akshay Krishnamurthy · David Simchi-Levi -
2019 Poster: On the Convergence Rate of Training Recurrent Neural Networks »
Zeyuan Allen-Zhu · Yuanzhi Li · Zhao Song -
2019 Poster: What Can ResNet Learn Efficiently, Going Beyond Kernels? »
Zeyuan Allen-Zhu · Yuanzhi Li -
2019 Poster: Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers »
Zeyuan Allen-Zhu · Yuanzhi Li · Yingyu Liang -
2019 Poster: Phase Transitions and Cyclic Phenomena in Bandits with Switching Constraints »
David Simchi-Levi · Yunzong Xu -
2019 Poster: Can SGD Learn Recurrent Neural Networks with Provable Generalization? »
Zeyuan Allen-Zhu · Yuanzhi Li -
2018 Poster: How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD »
Zeyuan Allen-Zhu -
2018 Poster: Byzantine Stochastic Gradient Descent »
Dan Alistarh · Zeyuan Allen-Zhu · Jerry Li -
2018 Poster: Natasha 2: Faster Non-Convex Optimization Than SGD »
Zeyuan Allen-Zhu -
2018 Poster: NEON2: Finding Local Minima via First-Order Oracles »
Zeyuan Allen-Zhu · Yuanzhi Li -
2018 Spotlight: Natasha 2: Faster Non-Convex Optimization Than SGD »
Zeyuan Allen-Zhu -
2018 Poster: Is Q-Learning Provably Efficient? »
Chi Jin · Zeyuan Allen-Zhu · Sebastien Bubeck · Michael Jordan -
2017 Poster: Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls »
Zeyuan Allen-Zhu · Elad Hazan · Wei Hu · Yuanzhi Li -
2017 Spotlight: Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls »
Zeyuan Allen-Zhu · Elad Hazan · Wei Hu · Yuanzhi Li -
2016 Poster: Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters »
Zeyuan Allen-Zhu · Yang Yuan · Karthik Sridharan -
2016 Poster: Optimal Black-Box Reductions Between Optimization Objectives »
Zeyuan Allen-Zhu · Elad Hazan -
2016 Poster: Even Faster SVD Decomposition Yet Without Agonizing Pain »
Zeyuan Allen-Zhu · Yuanzhi Li