Skip to yearly menu bar Skip to main content


Microsoft Corporation

Expo Talk Panel

Distributed Orthonormal Updates for Large-Scale Training

Upper Level Ballroom 6CDEF
[ ]
Tue 2 Dec 4 p.m. PST — 5 p.m. PST

Abstract:

We propose a 50-minute technical talk on recent advances in orthonormal update methods for large-scale AI model training. This topic is rapidly gaining attention in the community, emerging as a strong successor to AdamW following the success of orthonormal optimizers in training production-scale models such as Kimi-K2 and GLM-4.5.x000D
The talk will center on the design and practice of orthonormal updates, focusing on optimizers such as Muon and Dion. While we will briefly discuss their theoretical foundations, the emphasis will be on practical usage: how to integrate these optimizers into modern training pipelines, interpret their algorithmic components, and leverage the implementation guidelines provided in our open-source codebase at github.com/microsoft/dion.x000D
The talk is designed to engage both researchers and practitioners in the NeurIPS community:x000D
Academic perspective: presents a new class of optimizers grounded in theory along with how they interact with distributed training.x000D
Industrial perspective: highlights how orthonormal updates are implemented in practice and what best practices are.x000D
This topic lies at the intersection of optimization theory, scalable systems, and large-model training—an area of growing importance for both the research and applied machine learning communities.

Live content is unavailable. Log in and register to view live content