Workshops
Structured Probabilistic Inference and Generative Modeling
Reliable ML from Unreliable Data
Distributions shift, chatbots get jail‑broken, users game algorithms — how do we build reliable machine learning when data are missing, corrupted, or strategically manipulated?
This workshop bridges theory and practice to tackle these challenges, bringing together researchers working on distribution shift, adversarial robustness, and strategic behaviour to chart principled yet deployable solutions for Reliable ML from Unreliable Data.
GenProCC: 1st Workshop on Generative and Protective AI for Content Creation
Recent advancements in generative AI (GenAI) have empowered machines to create high-quality content across diverse modalities - text, image, audio, and video - with impressive fluency and creativity. From GPT-4o and Stable Diffusion to Sora and MMAudio, the explosion of X-to-X generation (e.g., text-to-image, video-to-audio) is unlocking new frontiers in science, education, entertainment, and art.
While GenAI has shown significant potential in creative applications (e.g., music, films, arts), these breakthroughs also raise pressing concerns related to safety, copyright, and ethical use. Generative models can be exploited to spread misinformation, violate intellectual property rights, or diminish human agency in creative processes. As such, there is an increasing need to balance innovation with protection, ensuring that AI-powered creative tools are used responsibly and ethically.
This workshop, GenProCC: Generative and Protective AI for Content Creation, brings together researchers, creators, and practitioners at the intersection of content generation and IP protection. By uniting the generative AI and creator communities, the GenProCC workshop aims to explore the latest advances, challenges, and opportunities in the rapidly evolving field.
What Makes a Good Video: Next Practices in Video Generation and Evaluation
This workshop aims to explore how real-world advances in video generation increasingly rely on forwardlooking evaluation frameworks and to collaboratively shape the next generation of high-quality video synthesis. Through a combination of invited talks, academic presentations, and expert discussions featuring leading voices from both academia and industry, the workshop bridges academic foundations and industrial insights across the modeling, evaluation, and deployment of video generation. We welcome contributions from computer vision, generative modeling, video-language learning, evaluation methodology, and human-centered AI to shape the next generation of high-quality video synthesis collaboratively.
ML x OR: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision-Making
Much of traditional decision-making science is grounded in the mathematical formulations and analyses of structured systems to recommend decisions that are optimized, robust, and uncertainty-aware. This scientific approach, rooted in the field of Operations Research (OR), has evolved through decades of advancements in stochastic modeling, computational simulation and optimization, and exhibits key strengths in methodological rigor and uncertainty encoding. On the other hand, recent advances in the AI/ML space have eschewed this model-based paradigm and increasingly embraced, to great success, model-free algorithmic design frameworks. This workshop, which is the first NeurIPS workshop explicitly themed and structured on ML-OR synergization, aspires to present recent developments, challenges and emerging research to accelerate ML-OR synthesis. By integrating ML into established OR methodologies, we have the opportunities to produce more data-centric and adaptive solutions for complex decision-making tasks that could propel, in a much faster-paced manner, the frontier of "optimality" across many relevant applications. Concomitantly, the goal is also to explore how model-based principled OR approaches can help alleviate issues revolving around "black box" systems, and provide paths to enhance interpretability, trust, and performance analysis.
MATH-AI: The 5th Workshop on Mathematical Reasoning and AI
Deep Learning for Code in the Agentic Era
Deep learning for code has progressed from focused tasks—such as completion, repair, synthesis, and explanation to tackling complex, end-to-end software–engineering problems. A key recent breakthrough is the rise of coding agents. Unlike single-shot models, these systems plan, reason, explore, and invoke external tools to assist throughout the software-development lifecycle: adding features, refactoring, debugging, finding vulnerabilities, optimizing performance, summarizing code, and answering repository-level questions. Their growing versatility demands rigorous evaluation and a deeper understanding of their capabilities, limits, risks, and broader social impact.
Building on momentum from both academia and industry (e.g. Google, OpenAI, Anthropic, SWE-Agent, OpenHands), we propose the 4th Deep Learning for Code (DL4C) workshop with a dedicated focus on coding agents. This workshop will provide a timely forum where researchers and practitioners can design and stress-test robust coding agents, discover novel applications and emergent behaviors, establish principled benchmarks and evaluation methods, study human–agent collaboration at scale, and advance the responsible, safe deployment of autonomous coding tools.
Foundation Models for the Brain and Body Workshop
Differentiable Learning of Combinatorial Algorithms: From Theory To Practice
AI Virtual Cells and Instruments: A New Era in Drug Discovery and Development
As the US FDA phases out animal testing requirements for drug discovery and development, AI tools will become widely adopted to simulate the effects of candidate drugs. We posit that building virtual cells and instruments with AI is poised to transform drug discovery and development by enabling large-scale simulation and interrogation of molecules, cells, and tissues. In our workshop, we will collaboratively define and promote this emerging scientific paradigm of AI to accelerate the drug discovery and development process in this new era.
AI and ML for Next-Generation Wireless Communications and Networking (AI4NextG @ NeurIPS’25)
The field of wireless communications and networking is undergoing a paradigm shift, driven by the growing potential of Artificial Intelligence (AI) and Machine Learning (ML) to redefine traditional system design principles. This workshop aims to catalyze interest and foster collaboration between the AI/ML and wireless communications communities. The timing of this workshop is especially significant, as the next-generation (NextG) wireless standardization efforts (such as 6G and WiFi 9) are just getting started, with AI-native technologies expected to play a central role across all aspects of the wireless ecosystem – from radio access to network management and edge intelligence. NextG represents a foundational shift in global infrastructure, enabling ultra-fast, low-latency, and intelligent connectivity that will power future applications in AI, robotics, immersive environments, and autonomous systems. These technologies offer unprecedented opportunities to both drive and benefit many applications, from healthcare and transportation to industrial automation and environmental monitoring. The economic and societal implications are vast: NextG networks will underlie trillions in global GDP impact, bridge digital divides, and shape how billions of people interact with technology and each other in the decades to come.
Despite the clear promise, a significant disconnect exists between the AI/ML and wireless research communities. AI/ML experts often lack an understanding of the unique physical, algorithmic, and architectural constraints inherent in wireless systems, while wireless researchers tend to adopt generic, off-the-shelf AI/ML models that are not optimized for the intricacies of wireless environments. Wireless environments are inherently dynamic, high-dimensional, and partially observable. These unique characteristics make them ideal testbeds for developing robust learning algorithms, particularly in areas like online learning, reinforcement learning, and in-context learning. At the same time, AI/ML techniques are becoming essential for managing the growing complexity of modern wireless networks, including resource allocation, interference mitigation, and cross-layer optimization. Bridging the gap between the two communities is not only necessary for meaningful technological advances but also critical for realizing the full societal impact of intelligent wireless systems.
This workshop aims to bring together researchers and practitioners at the intersection of artificial intelligence (AI), machine learning (ML), and wireless to address the unique challenges and opportunities posed by Next-Generation (NextG) wireless systems. As the 6G era begins to take shape, AI-native designs have emerged as a cornerstone of wireless innovation, with the potential to transform the performance, efficiency, and adaptability of communication systems. The integration of AI/ML is poised to influence every layer of the network stack, from physical-layer signal processing to network control and resource management.
The Second Workshop on GenAI for Health: Potential, Trust, and Policy Compliance
Generative AI (GenAI) has emerged as a transformative force in healthcare, yet public trust remains limited due to safety concerns and policy misalignment. Build- ing on last year’s successful GenAI4Health workshop, the field has rapidly evolved from exploratory research to real-world clinical deployments, accompanied by FDA regulatory involvement and expanding global governance frameworks. This second workshop convenes machine learning researchers, healthcare professionals, and policy experts to address the critical intersection of GenAI innovation and regula- tory compliance in health applications. We will examine trustworthiness challenges in large language models and multimodal foundation models, explore mitigation strategies, and foster dialogue between technical and policy communities. Our goal is to advance safe, effective, and ethically-compliant GenAI integration in healthcare systems, improving patient outcomes and clinical research capabilities.
CauScien: Uncovering Causality in Science
Lock-LLM Workshop: Prevent Unauthorized Knowledge Use from Large Language Models - Deep Dive into Un-Distillate, Un-Finetunable, Un-Compressible, Un-Editable, and Un-Usable
Large Language Models (LLMs) have emerged as transformative tools across research and industry, revolutionizing how we interact with information. However, their immense capabilities bring critical security challenges—the same features that drive innovation can be exploited for malicious purposes through unauthorized distillation, fine-tuning, compression, or editing. These vulnerabilities pose severe threats, including intellectual property theft, the generation of sophisticated disinformation, the bypass of safety alignments, and the erosion of user trust in AI systems.
This workshop aims to bring together researchers and practitioners from academia and industry who are advancing the frontiers of LLM security and protection. We seek to confront the unauthorized use of LLMs head-on by exploring novel and robust mechanisms designed to make these models inherently resistant to exploitation while maintaining their beneficial capabilities. The workshop also hosts the 2025 TrustAI Rising Star Award.
Topics of interest include, but are not limited to:
1. Un-Distillable LLMs: Preventing unauthorized model replication and intellectual property theft
2. Un-Finetunable LLMs: Resisting malicious parameter updates and behavior alterations
3. Un-Compressible LLMs: Maintaining model integrity against unauthorized compression
4. Un-Editable LLMs: Safeguarding against knowledge tampering and misinformation injection
5. Un-Usable LLMs: Ensuring traceability and preventing misuse through watermarking and verification
AI4Mat-NeurIPS-2025: NeurIPS 2025 Workshop on AI for Accelerated Materials Design
AI4Mat-NeurIPS-2025 explores applications of artificial intelligence (AI) to materials via: 1. AI-Guided Materials Design; 2. Automated Chemical Synthesis; 3. Automated Material Characterization. AI4MatNeurIPS-2025 emphasizes structured, expert-driven dialogue on making advanced machine learning more impactful for real-world materials discovery. To that end, AI4Mat-RLSF (Research Learning from Speaker Feedback) creates a new structured discussion format where spotlight presenters receive curated, in-depth feedback from invited discussants. Further, the AI4Mat Frontiers & Benchmarking session brings together a diverse and distinguished set of speakers to critically examine current benchmarks, present state-of-the-art methods, and identify emerging opportunities and current limitations in AI-driven materials design.
The First Workshop on Efficient Reasoning
Recent progress in large reasoning models (LRMs), like OpenAI o1 and Deepseek R1, has been pivotal for tackling complex applications, from mathematical and code reasoning to advanced symbolic and agentic planning. Their success often relies on test-time scaling, which involves increasing the generation length or depth. However, these approaches incur significant efficiency bottlenecks during training and inference. To overcome these limitations, further advancements are needed in data, algorithms, and systems applicable across various domains, as exemplified by work such as s1, Z1, and verl. The proposed workshop will bring together researchers and practitioners to rethink efficient reasoning under tight compute, memory, latency, throughput, and cost budgets, with the goal of translating theoretical breakthroughs into practical, deployable solutions.
UniReps: Unifying Representations in Neural Models
When, how and why do different neural models learn the same representations?
New findings in neuroscience and artificial intelligence reveal a shared pattern: whether in biological brains or artificial models, different learning systems tend to create similar representations when subject to similar stimuli.
The emergence of these similar representations is igniting a growing interest in the fields of neuroscience and artificial intelligence, with both fields offering promising directions for their theoretical understanding. These include analyzing the learning dynamics in neuroscience and studying the problem of identifiability in the functional and parameter space in artificial intelligence.
While the theoretical aspects already demand investigation, the practical applications are equally compelling: aligning representations allows for model merging, stitching and reuse, while also playing a crucial role in multi-modal scenarios. Furthermore, studying the features that are universally highlighted by different learning processes brings us closer to pinpoint the invariances that naturally emerge from learning models, possibly suggesting ways to enforce them.
The objective of the workshop is to discuss theoretical findings, empirical evidence and practical applications of this phenomenon, benefiting from the cross-pollination of different fields (ML, Neuroscience, Cognitive Science) to foster the exchange of ideas and encourage collaborations.
In conclusion, our primary focus is to delve into the underlying reasons, mechanisms, and extent of similarity in internal representations across distinct neural models, with the ultimate goal of unifying them into a single cohesive whole.
AI for non-human animal communication
The past few years have seen an unprecedented surge in both the availability of bioacoustic data and the sophistication of AI/machine learning models. This convergence presents a unique window of opportunity to revolutionize our understanding of animal communication and biodiversity. However, achieving this requires a conscious effort to integrate the disciplines of AI/Machine Learning and Ethology.
This workshop will explore the intersection of artificial intelligence (AI) and bioacoustics, aiming to address challenges in processing complex bioacoustic data and interpreting animal signals in order to advance our understanding of non-human animal communication. Join us for a poster session, keynote talks and a panel discussion as we explore key opportunities to use AI to decipher animal languages and thus deepen our understanding of the natural world.
Second Workshop on Aligning Reinforcement Learning Experimentalists and Theorists (ARLET)
Recent progress in reinforcement learning (RL) has powered breakthroughs in various real-world problems, gathering considerable attention and investment. However, it has also exposed a significant gap between theoretical and experimental developments.
RL theory has grown significantly in the past two decades. Research has characterized the inherent difficulty of various settings and has designed a wide variety of algorithms to reach optimal performances. Furthermore, a huge leap has been made in understanding how to handle large state spaces using function approximation techniques, identifying key structural properties that enable efficient learning.
Despite theoretical guarantees, applying RL algorithms to complex problems faces challenges. Theoretical algorithms often focus on simplified settings, making them hard to apply to real-world complexities. Furthermore, optimizing for worst-case scenarios, which include unlikely situations, can lead to algorithms that perform poorly on practical tasks. Yet, while specialized algorithms offer empirical success, they might not translate to other problems due to their specific design, and the reliance on heuristics and engineering fixes further widens the gap between theory and practice.
A prominent area that has seen a surge of interest in RL is generative language modeling. Pre-training these models can be viewed as a form of imitation learning, while post-training typically implements RL algorithms for purposes like instruction tuning with RL from human feedback or enhancing reasoning capabilities. While these successes make the practical utility of RL undeniable, the RL community finds itself at a crossroads. The algorithms employed are frequently variants of classical methods, and exploring beyond these presents a key challenge. Conversely, the success of these models prompts new questions for RL theory, suggesting that frameworks leveraging pre-trained models might offer a more effective paradigm than learning from scratch under traditional assumptions.
Following the success of the ICML 2024 edition, the Second Workshop on Aligning Reinforcement Learning Experimentalists and Theorists (ARLET) aims to bridge this discrepancy and promote collaboration. By bringing together experts from both sides, we want to facilitate meaningful discussions and draw a path for future RL research. Motivated by the take-home messages from the previous edition, we seek to encourage: (i) theorists to ask experimentalists for concrete problems to solve, and (ii) experimentalists to seek theoretical guidance on how to approach these problems.
ML for Systems
The 9th Machine Learning for Systems (ML for Systems) workshop brings together researchers and practitioners applying machine learning to core computer systems challenges. This year, we focus on three themes: (1) using LLMs and agentic workflows for systems tasks such as program synthesis and adaptive optimization; (2) applying ML to manage the complexity of large-scale training and serving of multimodal and reasoning models; and (3) leveraging ML for sustainable computing, including energy-, power-, and carbon-aware optimization. The workshop will feature invited talks, contributed presentations, and discussions aimed at advancing the frontier of ML for Systems research.
Dynamics at the Frontiers of Optimization, Sampling, and Games
Algorithmic Collective Action
The study of collective action has a long history in economics and sociology as a way for groups of people to impact markets and the political arena. Algorithmic collective action focuses on the study of such coordinated efforts in algorithmically-mediated sociotechnical systems. How can participants of AI systems coordinate towards socially beneficial outcomes? We offer a platform to discuss new ideas and help define the foundational research directions for this emerging topic through interdisciplinary discussions between core ML researchers, scholars from the social sciences, community stakeholders and advocates.
Biosecurity Safeguards for Generative AI
Machine Learning and the Physical Sciences
The Machine Learning and the Physical Sciences (ML4PS) workshop at NeurIPS is a unique gathering space for the growing community spearheading cross-cutting research topics at the intersection of machine learning (ML) and the physical sciences (PS). This includes the applications of ML to problems in the physical sciences (ML for PS) as well as developments in ML motivated by physical insights (PS for ML). The physical sciences are defined inclusively, including but not limited to physics, astronomy, cosmology, chemistry, biophysics, materials science, and Earth science. Join us to discuss the latest research at the convergence of these fields!
OPT 2025: Optimization for Machine Learning
Optimization lies at the heart of many machine learning algorithms and enjoys great interest in our community. Indeed, this intimate relation of optimization with ML is the key motivation for the OPT series of workshops. We aim to foster discussion, discovery, and dissemination of state-of-the-art research in optimization relevant to ML.
The focus of OPT 2025 is on "Statistics Meets Optimization". Since its inception, stochastic optimization has been grounded in statistical principles. Today, many of the most pressing challenges in machine learning—such as generalization bounds, the training dynamics of overparameterized models, and the development of generative models—are directly inspired by statistical thinking. At the same time, the scale and complexity of modern datasets, along with the increasingly rich model classes used to represent them, pose new questions about how optimization algorithms interact with these structures—both computationally and statistically. For example, what role do data symmetries play in shaping optimization trajectories? How do statistical properties of the data affect the adaptivity and efficiency of learning algorithms? And how can optimization approaches be designed to scale with data while still preserving desirable statistical behavior? OPT 2025 will explore these questions with the goal of building bridges between the statistics and optimization communities, and highlighting their shared impact on the theory and practice of machine learning.
We are looking forward to seeing you all at OPT 2025, which will take place at the San Diego Convention Center!
Imageomics: Discovering Biological Knowledge from Images Using AI
Imageomics is an emerging interdisciplinary field at the crossroads of machine learning (ML), computer vision (CV), and biological sciences. It leverages visual data—from microscopic images of single-cell species to videos of megafauna—to extract and analyze biological information, specifically traits. By grounding ML models in existing scientific knowledge, Imageomics aims to make traits computable from images, facilitating insights into the evolution and function of living organisms. Imageomics poses research problems that resonate with the broad machine-learning community: multimodal representation learning, object detection and tracking, few-shot learning, imbalanced-class learning, video understanding, 3D modeling, hierarchical learning, etc. When people leverage ML tools to solve biological questions, the foundational bridges between ML and biological sciences also provide opportunities to address key challenges in ML, creating a virtuous cycle between the two fields.
Embodied World Models for Decision Making
World models infer and predict real-world dynamics by modeling the external environment, and have become a cornerstone of embodied artificial intelligence. They have powered recent progress in decision-making and planning for interacting agents. This workshop aims to bring together researchers working at the intersection of generative modeling, reinforcement learning, computer vision, and robotics to explore the next generation of embodied world models—models that enable agents to understand, predict, and interact with the world through learned models. By focusing on embodiment and decision-making, this workshop seeks to advance world models beyond passive prediction, toward active, goal-driven interaction with the physical and virtual world. By emphasizing embodiment and decision-making, we aim to move beyond passive sequence prediction toward goal-directed interaction with both physical and simulated worlds.
Workshop on Multi-Turn Interactions in Large Language Models
The field of AI is entering a new era of interaction, profoundly shaped by the capabilities of Large Language Models (LLMs). While multi-turn interaction has been a long-standing pursuit in AI—from dialogue systems to multi-agent coordination—the advent of LLMs has radically transformed this landscape. These models now engage in complex, long-horizon interactions, process diverse data, and make crucial decisions in dynamic, human-centric scenarios.
This leap forward, however, brings forth critical new research questions and challenges that demand immediate attention:
Multi-Turn RL Learning for Agentic Tasks Learning from complex, interactive environments like GUI agents and tool-use scenarios, given the challenges of sparse rewards.
Maintaining Alignment Understanding human values over extended, multi-turn interactions, preventing "loss of alignment" seen in current models.
Human-AI Interaction Over time, ensuring models adapt to user goals without compromising safety or fairness.
Long-horizon Evaluation For LLMs' long-term capabilities, consistency, and strategic abilities in complex, multi-turn tasks.
The Workshop on Multi-Turn Interactions in LLMs is designed to be the central forum for addressing these pivotal questions. We invite researchers to contribute to defining the next generation of interactive AI, tackling these core challenges, and charting the course for future advancements in AI reasoning and planning. This workshop will concentrate on key areas where the extended use of LLMs presents both new challenges and opportunities, serving as a platform to discuss and refine methods for future improvements and evaluation for practical LLM use cases.
Generative AI in Finance
This workshop aims to foster cross-disciplinary collaboration at the intersection of generative AI and finance, a high-stakes domain where the integration of domain expertise is essential to the safe and effective deployment of machine learning technologies. Recent advances in generative models—ranging from large language models to diffusion and score-based generative architectures—have opened new frontiers for applications in finance, such as financial modeling, stress testing, scenario generation, automated financial services, and decision-making under uncertainty.
The workshop will highlight theoretical advances, practical implementations, new opportunities, and open challenges that arise when adapting generative AI to financial systems under unique constraints, such as data sparsity, regulatory requirements, and highly non-stationary and adversarial environments. By bringing together the computer science community, financial researchers, industry practitioners, and regulators, we aim to catalyze interdisciplinary dialogue and accelerate the responsible development of generative AI tailored to the needs of finance and risk management.
New Perspectives in Graph Machine Learning
GPU-Accelerated and Scalable Optimization (ScaleOpt)
Recent advancements in GPU-based large-scale optimization have been remarkable. Recognizing the revolution in optimizing neural network weights via large-scale GPU-accelerated algorithms, the optimization community has been interested in developing general purpose GPU-accelerated optimizers for various families of classic optimization problems, including linear programming, general conic optimization, combinatorial optimization, and more specific problem families such as flow optimization and optimal transport. Beyond deploying GPUs directly at classical problems, current frontier AI tools—including large language models (LLMs)—are being deployed to solve optimization problem. Various works have used neural networks to solve mixed integer problems, linear or quadratic programs, general combinatorial optimization problems, and more specific optimization problems such as LASSO and robust PCA. In this workshop, we aim to provide a platform for interested researchers to engage with each other on recent breakthroughs and current bottlenecks in designing large-scale GPU-based optimizers and synergizing AI systems with solving optimization problems.
UrbanAI: Harnessing Artificial Intelligence for Smart Cities
Recent Advances in Time Series Foundation Models: Have We Reached the ‘BERT Moment’?
Foundation models (FMs) have achieved great success in NLP and vision, inspiring over 20 new time series FMs (TSFMs) in the past year. Despite promising results, studies show that carefully designed lightweight supervised baselines often match TSFM performance. Unlike NLP’s “BERT Moment,” TSFMs still require full fine-tuning to be competitive in real-world scenarios. Additionally, some tabular FMs rival TSFMs without being time series-specific. Recent benchmarks also provide mixed evidence: GIFT-Eval favors TSFMs, OpenTS shows statistical models outperforming deep learning on univariate data, and FoundTS finds supervised baselines on par with TSFMs. This workshop aims to bring together researchers to examine the gap between TSFM potential and real-world utility, and to identify benchmarks and applications where TSFMs can truly excel.
The key topics of this workshop include, but are not limited to:
- Benchmarking Foundation Models in Time Series,
- Scaling Laws and Efficiency in Time Series Models,
- Evaluating Transferability and Adaptability of Foundation Models,
- Leveraging Foundation Models of Other Modalities for Time Series,
- Unsupervised performance estimation of TSFMs,
- Industrial Benchmarking of Time Series Foundation Models
More details are provided in our Call for Papers.
Constrained Optimization for Machine Learning
As AI systems are increasingly deployed in safety-critical domains—including credit scoring, medical diagnosis, and autonomous systems—there is a growing demand to ensure their fairness, safety, robustness, and interpretability, alongside stronger calls for regulation. Constrained optimization offers an accountable framework for enforcing these requirements by embedding them directly into the training process, steering models to satisfy explicit constraints. This framework facilitates compliance with regulatory, industry, or ethical standards, which can be easily verified by checking constraint satisfaction.
This workshop explores constrained optimization as a principled method for enforcing desirable properties in machine learning models. It brings together experts in optimization, machine learning, and trustworthy AI to address the algorithmic and practical challenges of scaling constrained methods to modern deep learning settings, which are often large-scale, non-convex, and stochastic.
Learning from Time-Series for Health
Time-series data underpin modern healthcare, spanning electronic health records, physiological waveforms, wearables, and population trends, yet their unique characteristics—including uncertain ground truth, quasi-periodic physiological motifs, and non-semantic timepoints—demand specialized machine learning approaches. While recent advances in foundation models, multimodal learning, and generative methods show promise, significant challenges remain in causality, interpretability, and deployment. This workshop unites researchers across health time-series domains (from wearables to clinical systems) to address shared challenges through: (1) cross-domain discussion, (2) diverse industry/academic perspectives (featuring Google, Oura, Apple and 5 institutions), and (3) community engagement via posters, talks, and panels. By fostering cross-domain collaboration on physiological-aware methods, we aim to bridge the gap between cutting-edge ML and real-world healthcare impact.
Foundations of Reasoning in Language Models
Our workshop’s goal is to advance foundational understanding, principled innovations, and rigorous scientific evaluations for reasoning in language models. These advancements are built upon theoretical analyses and controlled empirical studies that illuminate how reasoning emerges, where it fails, and how it can be systematically improved.
We want to foster dialogue between communities with complementary strengths---those building theoretical models of reasoning phenomena, those designing experiments that reveal its emergence or failure in practice, and those proposing algorithmic developments that advance reasoning---around three primary questions:
1. How are language models able to solve complex tasks, and what do they still struggle with?
2. What fundamental challenges stand in the way of advancing reasoning capabilities?
3. What algorithmic innovations can overcome these obstacles?
Workshop on Scaling Environments for Agents
The development of intelligent agents – particularly those powered by large language models (LLMs) – has emphasized the critical role of environments in shaping agent behavior and capabilities, especially for achieving end-to-end autonomy. Environments are not merely testing grounds; they are dynamic, interactive contexts that serve as the essential "data" for agents to learn adaptive behavior, complex reasoning, and long-term decision-making skills. Just as scaling the model size, dataset size, and training computation has led to emergent capabilities in LLMs, scaling the structure, fidelity, and diversity of environments is one of the crucial dimensions in advancing agent intelligence. Moreover, recent advances in end-to-end reinforcement learning (RL), particularly when paired with LLM-based agents, have made it increasingly viable to train agents through sustained interaction. These agents can now acquire skills, strategies, and planning abilities through environmental feedback, rather than relying solely on imitation learning or static prompt engineering. As we move toward more autonomous, general-purpose agents, the need for scalable, richly interactive, and diverse environments has become both urgent and foundational.
What Can('t) Transformers Do?
With most advances in large foundation models (LFMs) being empirical, our theoretical understanding of what transformers can compute, express, and learn still lags behind. This workshop will convene theorists and empiricists to chart a rigorous agenda for the next generation of LFMs, asking “What can and can’t transformers do?” We welcome both formal analyses and empirically grounded studies that shed light on theoretical questions, aiming to close the gap between proofs and practice while fostering new, interdisciplinary collaborations.
Regulatable ML: Towards Bridging the Gaps between Machine Learning Research and Regulations
Learning to Sense (L2S)
The workshop explores the joint optimization of sensors and machine learning models, pushing beyond traditional paradigms of data acquisition and processing. We aim to rethink the foundations of how machines sense the world by replacing hand-crafted ISPs, leveraging learnable sensor layouts, and adopting task-driven sensing strategies.
We welcome original contributions and position papers on the following topics (non-exhaustive):
Sensor optimization for e.g. computer vision (bit-depth, pixel layouts, color filter design)
RAW-to-task or RAW-to-label approaches for visual tasks
Co-design of neural networks and sensor hardware
Low-bit and energy-efficient sensing for embedded or mobile devices
Benchmarks, datasets, and metrics for evaluating sensor-model pipelines
Generalization and robustness of sensor-model systems in real-world conditions
Failure case studies and negative results in joint optimization pipelines
Join us to engage with cutting-edge research and cross-disciplinary discussions that are shaping the future of sensor systems for real-world deployment across mobile, embedded, and autonomous platforms.
CogInterp: Interpreting Cognition in Deep Learning Models
Recent innovations in deep learning have produced models with impressive capabilities, achieving or even exceeding human performance in a wide range of domains. A timely and critical challenge in AI is understanding what behaviors these models are actually capable of, and the internal processes which support these behaviors. As interest continues to grow in models’ internal processes, the field of cognitive science is becoming increasingly useful for describing and understanding cognition in deep learning models: cognitive science, which seeks to describe the cognitive processes in human and animal minds, offers a rich body of theories, experiments, and frameworks which may be adopted to understand how deep learning models achieve complex behaviors in domains such as language, vision, and reasoning.
The workshop will focus on Cognitive Interpretability (“CogInterp”), which involves the systematic interpretation of high-level cognition in deep learning models. Similar to how cognitive science describes the intermediate representations and algorithms (or cognition) between behavior and neurons in biological systems, the goal of Cognitive Interpretability is to describe the cognitive processes which lie between the levels of behavioral evaluations and mechanistic interpretability in deep learning models. Practically speaking, this means that Cognitive Interpretability does not just ask whether a model can perform task X or has a certain ability Y , but additionally (or instead) how a model performs X or learns and implements Y . These kinds of inferences—from observable behavior to latent “mental” processes—are the bread and butter of cognitive science, but many of the theoretical and empirical tools developed to tackle these problems have not yet been widely adopted in AI research, in part because of the separation between the fields and communities.
To address the gap above, our goal is to bring together researchers in cognitive science and AI interpretability to discuss new empirical results and theories about the inner workings of deep learning models. We hope to gather perspectives from various disciplines, including machine learning, psychology, linguistics, vision science, neuroscience, philosophy of mind, and law.
Multimodal Algorithmic Reasoning Workshop
Large AI frameworks have been increasing in their data modeling abilities at an ever more vigor in recent times, with compelling applications emerging frequently, many of which may even appear to challenge human intelligence. Yet despite such impressive performance, there remain open questions about whether these models include the foundations of general intelligence, or whether they perform these tasks without human-like understanding. This necessitates development of better tools for assessing these models in tandem with developing the models themselves. This workshop focuses on the topic of multimodal algorithmic reasoning, where an agent needs to assimilate information from multiple modalities towards deriving reasoning algorithms for complex problem solving. In the last year, we have seen rapid advances in AI capabilities that better bridge across modalities, bringing both optimism about superhuman capabilities and skepticism about the limits of current approaches. Through talks from outstanding researchers and faculty, we hope to dive deep into this exciting topic at the intersection of theory, multimodal learning and cognitive science to understand what we have achieved thus far in machine intelligence and what we are lacking in relation to the human way of thinking, towards finding the missing rungs on the ladder to truly intelligent reasoning.
Workshop on Mechanistic Interpretability
NeurIPS 2025 Workshop on Space in Vision, Language, and Embodied AI
Data on the Brain and Mind
AI That Keeps Up: Workshop on Continual and Compatible Foundation Model Updates (CCFM)
Foundation models, despite their impressive capabilities, face a critical challenge: they naturally become outdated. Trained on vast datasets, frequently updating these models is expensive. Crucially, these challenges extend beyond the scope of studies in traditional continual learning, as foundation models require rapid and scalable adaptation to dynamic global changes and the emergence of both generalized and specialized tasks. This workshop addresses the urgent need for up-to-date foundation models. We invite researchers to explore cost-effective methods for frequent updates and adaptation, minimizing forgetting and deterioration, ensuring a consistent user experience, and designing dynamic evaluations that remain relevant as models evolve.
Artificial Intelligence for Music: Where Creativity Meets Computation
This workshop explores the dynamic intersection of AI and music, a rapidly evolving field where creativity meets computation. The goal of this workshop is twofold: First, we aim to explore the latest advancements of AI’s applications for music, from analysis, creation, performance, production, retrieval to music education and therapy. Second, we aim to discuss the impacts and implications of AI in music, including AI’s impacts on the music industry, musician community, and music education as well as ethical, legal and societal implications of AI music and AI’s implications for future musicians.
AI for Science: The Reach and Limits of AI for Scientific Discovery
Through our proposed AI for Science workshop, we will bring together experimentalists, domain scientists, and ML researchers to discuss the reach and limits of AI for scientific discovery. We will center our discussion on three challenges that are essential to progress across scientific domains: LLM reasoning across scientific domains– can present-day LLMs generate rigorously testable hypotheses and reason over experimental results that span scientific domains such as physics, chemistry, and biology? Fidelity of generative and surrogate simulators– In biology, we see a shift towards all-atom models with increasingly powerful capabilities, in chemistry machine learning force fields are increasing in accuracy and generalizability, and in climate modeling we can now accurately predict weather 15 days out. How far can we push this limit? What spatial or temporal scales remain intractable? Experimental data scarcity and bias. We see modern examples of large-scale dataset generation such as the Protein Data Bank, Human Cell Atlas, and the Materials Project. Are there other fields where AI can benefit most from consortium efforts to generate large-scale datasets? How far can models trained on limited experimental datasets take us and where are lab-in-the-loop strategies essential? To address this, we additionally introduce a dataset proposal competition. Our workshop will highlight common bottlenecks in developing AI methods across scientific application domains, and delve into solutions that can unlock progress across all of these domains.
Symmetry and Geometry in Neural Representations
The fields of biological and artificial intelligence are increasingly converging on a shared principle: the geometry and topology of real-world structure play a central role in building efficient, robust, and interpretable representations. In neuroscience, mounting evidence suggests that neural circuits encode task and environmental structure through low-dimensional manifolds, conserved symmetries, and structured transformations. In deep learning, principles such as sparsity, equivariance, and compositionality are guiding the development of more generalizable and interpretable models, including new approaches to foundation model distillation. The NeurReps workshop brings these threads together, fostering dialogue among machine learning researchers, neuroscientists, and mathematicians to uncover unifying geometric principles of neural representation. Just as geometry and symmetry once unified the models of 20th-century physics, we believe they may now illuminate the computational foundations of intelligence.
Frontiers in Probabilistic Inference: Learning meets Sampling
Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling
Non-Euclidean Foundation Models and Geometric Learning: Advancing AI Beyond Euclidean Frameworks
In the era of foundation models and Large Language Models (LLMs), Euclidean space is the de facto geometric setting of our machine learning architectures. However, recent literature has demonstrated that this choice comes with fundamental limitations. Non-Euclidean learning is quickly gaining traction. Non-Euclidean spaces, such as hyperbolic, spherical, and mixed-curvature spaces, have been shown to provide more efficient and effective representations for data with intrinsic geometric properties, like hierarchy, symmetry, and heterogeneity.
Integrating foundation models with non-Euclidean spaces has great potential to enhance their ability to capture and model the underlying structures and relationships in complex real-world data, leading to better performance, generalization, and interpretability. This workshop focuses on the intersection of Non-Euclidean representation learning and Foundation Models, exploring its potential benefits, challenges, and future directions.
LAW 2025: Bridging Language, Agent, and World Models for Reasoning and Planning
Tackling Climate Change with Machine Learning
Many in the ML community wish to take action on climate change, but are unsure how to have the most impact. This workshop will highlight work that demonstrates that, while ML is no silver bullet, it can be an invaluable tool in reducing greenhouse gas emissions and in helping society adapt to the effects of climate change.
Climate change is a complex problem for which action takes many forms, from advancing theory to deploying new technology. Many of these actions represent high-impact opportunities for real-world change, and simultaneously pose interesting academic research problems.
The theme of this workshop, “Roots to Routes: A Dialogue on Different Machine Learning Methods for Climate Impact,” invites submissions that explore the strengths of diverse machine learning approaches in climate-related contexts. We particularly encourage work that demonstrates the effectiveness of classical ML methods under real-world constraints, such as limited data availability, privacy concerns, or restricted computational resources. At the same time, we welcome contributions that showcase how scaling up data and computing resources combined with modern tools and techniques can unlock new possibilities for tackling global-scale climate prediction challenges.
This workshop is part of a series that aims to bring together those applying ML to climate change challenges and facilitate cross-pollination between ML researchers and experts in climate-relevant fields.
The main workshop will take place on December 6 or 7, 2025 (exact date TBD).