Tutorials
Explain AI Models: Methods and Opportunities in Explainable AI, Data-Centric AI, and Mechanistic Interpretability
Understanding AI system behavior has become critical for safety, trust, and effective deployment across diverse applications. Three major research communities have emerged to address this challenge through interpretability methods: Explainable AI focuses on feature attribution to understand which input features drive model decisions; Data-Centric AI emphasizes data attribution to analyze how training examples shape model behavior; and Mechanistic Interpretability examines component attribution to understand how internal model components contribute to outputs. These three branches share the goal of better understanding AI systems across different aspects and differ primarily in their perspectives rather than techniques. This tutorial begins with foundational concepts and historical context, providing essential background on why explainability matters and how the field has evolved since its early days. The first technical deep dive covers post hoc explanation methods, data-centric explanation techniques, mechanistic interpretability approaches, and presents a unified framework demonstrating that these methods share fundamental techniques such as perturbations, gradients, and local linear approximations. The second technical deep dive explores inherently interpretable models, clarifying concepts like reasoning (chain-of-thought) LLMs and self-explanatory LLMs in the context of explainability, and techniques for building inherently interpretable LLMs. We also showcase open source tools that make these methods accessible to practitioners. Furthermore, we highlight promising future research directions in interpretability research and the induced future directions in AI more broadly, with applications in model editing, steering, and regulation. Through comprehensive coverage of algorithms, real-world case studies, and practical guidance, attendees will gain both a deep technical understanding of state-of-the-art methods and practical skills to apply interpretability techniques effectively in AI applications.
Model Merging: Theory, Practice and Applications
Energy and Power as First-Class ML Design Metrics
Planning in the Era of Language Models
For over six decades, the field of automated planning has been at the heart of AI, empowering intelligent systems to reason, act, and achieve goals in complex, dynamic environments. From robotics and logistics to space exploration, planning research has fueled autonomous decision-making in real-world applications.
Today, as large language models redefine what’s possible in AI, the principles and methodologies of planning are more vital than ever. The planning community brings decades of experience in designing, benchmarking, and interpreting intelligent behavior; expertise that can accelerate the development of powerful, trustworthy, and general-purpose LLM-based agents.
Participants will gain a clear understanding of what planning truly entails, what has been learned (and sometimes forgotten) in the shift toward LLM-based approaches, and how foundational insights from the planning community can inform the creation of stronger, more reliable, and more scalable LLM-powered planners.
Human-AI Alignment: Foundations, Methods, Practice, and Challenges
Foundations of Tensor/Low-Rank Computations for AI
New Frontiers of Hyperparameter Optimization: Recent advances and open challenges in theory and practice
Machine learning algorithms operate on data, and for any task the most effective method depends
on the data at hand. Hyperparameter optimization and algorithm selection are therefore crucial to ensure the best performance in terms of accuracy, efficiency, reliability, interpretability, etc. We first survey common techniques used in practice for hyperparameter optimization in machine learning including Bayesian optimization and bandit-based approaches. We next discuss new approaches developed in the context of Large Language Models, including neural scaling laws and parameterization-aware methods. We will discuss chief advantages and shortcomings of these approaches, in particular their limited theoretical guarantees. We will then discuss exciting new developments on hyperparameter tuning with strong theoretical guarantees. A growing line of work over the past decade from the learning theory community has successfully analysed how the algorithmic performance actually varies with the hyperparameter for several fundamental algorithms in machine learning including decision trees, linear regression, unsupervised and semi-supervised learning, and very recently even deep learning. This has allowed the development of techniques that take this structure into account, apply naturally to both hyperparameter tuning and algorithm selection, work well in dynamic or online learning environments, and are equipped with provable PAC (probably approximately correct) guarantees for the generalization error of the learned hyperparameter. Future research areas include integration of these structure-aware principled approaches with the currently used techniques, better optimization in high-dimensional and discrete spaces, and improving scalability in distributed settings.
Theoretical Insights on Training Instability in Deep Learning
The advances in deep learning build on the dark arts of gradient-based optimization. In deep learning, the optimization process is oscillatory, spiky, and unstable. This makes little sense in classical optimization theory, which primarily operates in a well-behaved, stable regime. Yet, the best training configuration in practice constantly operates in an unstable regime. This tutorial introduces recent theoretical progress in understanding the benign nature of training instabilities, providing new insights from both optimization and statistical learning perspectives.
Autoregressive Models Beyond Language
Autoregressive modeling is no longer confined to language. Recent work shows that the same next-element prediction principle can achieve state-of-the-art performance in generative modeling, representation learning, and multi-modal tasks across images, video, audio, robotics, and scientific data. Yet, extending autoregressive methods to these data is far from straightforward. Many inductive biases used in autoregressive language models no longer hold for other data modalities, and thus, many new techniques have been proposed in recent years to adapt autoregressive models to data beyond language.
This tutorial will review the core theory of autoregressive models, present practical design choices for generative modeling, representation learning, and multi-modal learning, and spotlight open challenges in this area. We hope our tutorial can provide the attendees with a clear conceptual roadmap and hands-on resources to apply and extend autoregressive techniques across diverse data domains.
Data Privacy, Memorization, & Legal Implications in Generative AI: A Practical Guide
Scale Test-Time Compute on Modern Hardware
Large language models have achieved significant breakthroughs in reasoning tasks, relying on the effective use of test-time compute. Techniques such as chain-of-thought and sampling-based strategies have shown that increasing test-time computation can dramatically enhance model performance. Our recent scaling law analyses highlight the critical role of test-time compute in enabling advanced reasoning, beyond what pretraining can offer. We also provide a practical analysis of hardware efficiency, revealing where bottlenecks arise and how they differ fundamentally from those in pretraining. Scaling test-time compute on modern hardware presents unique challenges. Compared to training workloads, test-time compute often exhibits low parallelism, irregular workload, frequent memory I/O, and dynamic execution paths, all of which make efficient deployment difficult. Therefore, practical scalability is often bottlenecked by system constraints, such as attention-related memory overheads and limited compute utilization. To address these challenges, the community has explored solutions across both systems and algorithms. On the system side, advancements include memory-efficient key-value cache management, optimized attention kernels, and scheduling mechanisms for adaptive resource allocation. On the algorithm side, emerging work has proposed model architectures and parallel generation paradigms that better align with hardware. This tutorial aims to provide a comprehensive overview of the landscape of scalable test-time compute. We will cover foundational challenges, review recent progress from both system and algorithm perspectives, and discuss principles for building solutions that are truly compatible with modern hardware. By bridging theory with deployment realities, we hope this tutorial will inspire and accelerate the development of practical, scalable LLM agent systems.