Oral Poster
The Road Less Scheduled
Aaron Defazio · Xingyu Yang · Ahmed Khaled · Konstantin Mishchenko · Harsh Mehta · Ashok Cutkosky
West Ballroom A-D #5908
[
Abstract
]
[ Project Page ]
Oral
presentation:
Oral Session 1C: Optimization and Learning Theory
Wed 11 Dec 10 a.m. PST — 11 a.m. PST
Wed 11 Dec 11 a.m. PST
— 2 p.m. PST
Wed 11 Dec 10 a.m. PST — 11 a.m. PST
Abstract:
Existing learning rate schedules that do not require specification of the optimization stopping step $T$ are greatly out-performed by learning rate schedules that depend on $T$. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available.
Live content is unavailable. Log in and register to view live content