Skip to yearly menu bar Skip to main content


Break
in
Workshop: OPT 2023: Optimization for Machine Learning

Poster Session 1

Egor Shulgin · Mingzhen He · Hanmin Li · Thibault Lahire · Eric Zelikman · Damien Scieur · Rajat Vadiraj Dwaraknath · Gene Li · Zhanhong Jiang · Rahul Jain · Zihan Zhou · Tianyue Zhang · Ilyas Fatkhullin · Frederik Kunstner · Utkarsh Singhal · Bruno Loureiro · Krishna C Kalagarla · Kai Liu · Michal Derezinski · Ross Clarke · Dimitri Papadimitriou · Mo Zhou · Jörg Franke · Chandler Smith · Darshan Chakrabarti · Trang H. Tran · Mokhwa Lee · Wei Kuang · Vincent Roulet · John Lazarsfeld · Donghyun Oh · Yihe Deng · Fu Wang · Junchi YANG · Dániel Rácz · Jeffrey Flanigan · Aaron Mishkin · Luca Scharr · Robert Gower · Chaoyue Liu · Yushen Huang · Nicholas Recker


Abstract:

Posters in this session

  • Towards a Better Theoretical Understanding of Independent Subnetwork Training

  • Revisiting Random Weight Perturbation for Efficiently Improving Generalization

  • Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization

  • Non-Uniform Sampling and Adaptive Optimizers in Deep Learning

  • Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation

  • Adaptive Quasi-Newton and Anderson Acceleration Framework with Explicit Global (Accelerated) Convergence Rates

  • On Optimization Formulations of Finite Horizon MDPs

  • Dueling Optimization with a Monotone Adversary

  • A Predicting Clipping Asynchronous Stochastic Gradient Descent Method in Distributed Learning

  • Average-Constrained Policy Optimization

  • Model-Free, Regret-Optimal Best Policy Identification in Online CMDPs

  • Vanilla Thompson Sampling Revisited

  • Stochastic Optimization under Hidden Convexity

  • Why Adam Outperforms Gradient Descent on Language Models: A Heavy-Tailed Class Imbalance Problem

  • DynaLay: An Introspective Approach to Dynamic Layer Selection for Deep Networks

  • How to Guess a Gradient

  • Escaping mediocrity: how two-layer networks learn hard generalized linear models

  • Safe Posterior Sampling for Constrained MDPs with Bounded Constraint Violation

  • Nesterov Meets Robust Multitask Learning Twice

  • Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

  • Adam through a Second-Order Lens

  • On the convergence of warped proximal iterations for solving nonmonotone inclusions and applications

  • Multi-head CLIP: Improving CLIP with Diverse Representations and Flat Minima

  • New Horizons in Parameter Regularization: A Constraint Approach

  • Riemannian Optimization for Euclidean Distance Geometry

  • Efficient Learning in Polyhedral Games via Best Response Oracles

  • Stochastic FISTA Step Search Algorithm for Convex Optimization

  • Almost multisecant BFGS quasi-Newton method

  • Statistical Inference of Adaptive Inexact Stochastic Newton Method

  • On the Interplay Between Stepsize Tuning and Progressive Sharpening

  • Decentralized Learning Dynamics in the Gossip Model

  • Pruning Neural Networks with Velocity-Constrained Optimization

  • Risk Bounds of Accelerated SGD for Overparameterized Linear Regression

  • DIRECT Optimisation with Bayesian Insights: Assessing Reliability Under Fixed Computational Budgets

  • Parameter-Agnostic Optimization under Relaxed Smoothness

  • Optimization dependent generalization bound for ReLU networks based on sensitivity in the tangent bundle

  • Understanding the Role of Optimization in Double Descent

  • Level Set Teleportation: the Good, the Bad, and the Ugly

  • Cup Curriculum: Curriculum Learning on Model Capacity

  • Variance Reduced Model Based Methods: New rates and adaptive step sizes

  • SGD batch saturation for training wide neural networks

  • Follow the flow: Proximal flow inspired multi-step methods

  • The Sharp Power Law of Local Search on Expanders

Chat is not available.