Abstract:
Oral (10 min)
- On the Relation between Distributionally Robust Optimization and Data Curation, Agnieszka Slowik
Spotlights (5 min)
- Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers, Jacques Chen
- Optimization with Adaptive Step Size Selection from a Dynamical Systems Perspective, Neha Wadia
- Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization, Difan Zou
There will be a Q&A in the last 5 minutes for all speakers. Abstracts for the talks are below the schedule.