As AI has become a huge industry, to an extent it has lost its way. What is needed to get us back on track to true intelligence? We need agents that learn continually. We need world models and planning. We need knowledge that is high-level and learnable. We need to meta-learn how to generalize. The Oak architecture is one answer to all these needs. It is a model-based RL architecture with three special features: 1) all of its components learn continually, 2) each learned weight has a dedicated step-size parameter that is meta-learned using online cross-validation, and 3) abstractions in state and time are continually created in a five-step progression: Feature Construction, posing a SubTask based on the feature, learning an Option to solve the subtask, learning a Model of the option, and Planning using the option’s model (the FC-STOMP progression). The Oak architecture is rather meaty; in this talk we give an outline and point to the many works, prior and contemporaneous, that are contributing to its overall vision of how superintelligence can arise from an agent’s experience.
Creative AI Session 1
Design for Intelligence: Humans, Machines and Nature in Emergent Co-Creation
What counts as intelligence when it is not human shaped, not aesthetic, not narrative and not computational? This panel explores Design for Intelligence as a continuation of earlier work on Intelligent Creative, where creative systems first began to perceive, adapt and evolve, revealing beauty as an emergent property of context and behaviour. It asks how systems create new knowledge through behaviour, relation and experience before that knowledge becomes symbolic or computational. The session examines works that move beyond top down design into living relational intelligence across human, artificial and non human environments.
Multi-Agent Systems in Industry: From Research to Real-World Impact
This workshop will bridge the gap between the theoretical advancements in multi-agent systems and their practical applications in industry. The session will feature a series of poster presentations showcasing state-of-the-art, real-world multi-agent systems that are driving innovation across various sectors. We will delve into the challenges and opportunities of deploying these systems at scale, covering topics such as:x000D
Human-in-the-loop collaboration: Designing systems where AI agents and human experts work in synergy.x000D
Scalability and efficiency: Architecting multi-agent systems for large-scale industrial applications.x000D
Safety and reliability: Ensuring the robustness and predictability of autonomous agents in critical systems.x000D
Domain-specific applications: Highlighting successful implementations in areas such as software engineering, scientific research, and creative content generation.x000D
The goal of this workshop is to foster a discussion on the practical challenges and future directions of multi-agent systems, providing attendees with actionable insights and a deeper understanding of how these technologies are shaping the future of industry.
On Device/Edge AI
From smartphones and wearables to autonomous vehicles, robots, and AR/VR systems, the demand for models that are efficient, private, and adaptive in real-time has never been higher. Yet deploying state-of-the-art AI at the edge remains challenging: researchers and practitioners must navigate heterogeneous hardware, memory and power constraints, compression and distillation trade-offs, as well as privacy, safety, and reliability requirements.
This workshop will bring together researchers, practitioners, and industry leaders to explore the frontiers of Edge AI. Topics will include lightweight model architectures, compiler/toolchain optimizations (e.g., quantization, pruning, sparsity), advances in frameworks such as ExecuTorch and TensorRT, distributed learning across devices, privacy-preserving training, and emerging applications where latency and trust are critical. Beyond technical advances, we will examine the broader implications for democratizing AI—enabling billions of devices to act as intelligent, personalized agents while reducing dependence on the cloud.
Motivation and Scope
Generative AI is evolving from offline, single modality models into interactive agentic systems that perceive, decide, and act in the real world. This shift marks a transition from static generation to dynamic, context-aware interaction. As these systems move toward deployment on edge devices such as mobile phones, augmented reality glasses, and robots, they face constraints in compute, memory, and latency. Beyond efficiency and responsiveness, a new frontier is emerging: agents equipped with persistent memory that enables long-term adaptation and personalization.
This workshop explores a timely and focused question. How do we build generative agents that are not only efficient and responsive but also able to accumulate, recall, and adapt based on personal memory over time? We aim to bring together perspectives from generative modeling, agentic learning, efficient model design, and memory systems to close the gap between lab scale prototypes and real-world deployment. x000D
x000D
Key Themes x000D
Personal Memory Systems for AI Assistants: Architectures for persistent memory, retrieval-augmented generation, and long-term personalization. x000D
Real-World Adaptation Few-shot generalization, continual learning, and task inference for evolving agent behavior. x000D
Grounded and Trustworthy Generation: Techniques for hallucination mitigation, constraint-aware generation, and safety under uncertainty. x000D
Deployment on Edge Platforms: Challenges and solutions for deploying generative agents on mobile, AR, and robotics platforms. x000D
x000D
This focused workshop aligns with emerging themes at NeurIPS including agentic learning, trustworthy AI, efficient multimodal generation, and embodied intelligence. It will spotlight the systems, algorithms, and design decisions needed to make generative AI truly adaptive and persistent, outside the data center and into the wild.
Using the Virtual Cell Platform to Accelerate Machine Learning in Biology
Biology presents some of the most complex and high-impact challenges for machine learning, and single-cell transcriptomics is at the frontier of this work. In this workshop, we introduce the Virtual Cell Platform (VCP), a unified environment designed to accelerate model development and evaluation in biology. Using single-cell transcriptomics as a case study, we will demonstrate how the VCP enables researchers to train, benchmark, and interpret models in a reproducible and biologically meaningful way.
Participants will gain a primer on single-cell transcriptomics and learn how to evaluate models with cz-benchmarks, an open-source Python package providing standardized, community-driven tasks and metrics. Through the VCP CLI, attendees will pull datasets, run packaged models, and compare results programmatically. Hands-on exercises will guide participants through interactive visualizations, side-by-side model comparisons, and deep dives into model behavior using VCP’s no-code interface and BYOD (Bring Your Own Data) module.
By the end of the session, attendees will understand how to use the VCP to actively test and refine models during development, ensure biological relevance, and contribute models and benchmarks back to the community. This workshop highlights how the Virtual Cell Platform transforms ML infrastructure into a one-stop, researcher-friendly ecosystem, empowering the NeurIPS community to push the boundaries of AI in biology.
Indigenous in AI/ML
Test of Time Award
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (Test of Time Award) haoqing Ren, Kaiming He, Ross Girshick, Jian Sun
Paper Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. Code is available at https://github.com/ShaoqingRen/faster_rcnn.
Are We Having the Wrong Nightmares About AI?
Though seemingly opposite, doom and optimism regarding generative AI's spectacular rise both center on AGI or even superintelligence as a pivotal moment. But generative AI operates in a distinct manner from human intelligence, and it’s not a less intelligent human on a chip slowly getting smarter anymore than cars were mere horseless carriages. It must be understood on its own terms. And even if Terminator isn’t coming to kill us or superintelligence isn’t racing to save us, generative AI does bring profound challenges, well-beyond usual worries such as employment effects. Technology facilitates progress by transforming the difficult into easy, the rare into ubiquitous, the scarce into abundant, the manual into automated, and the artisan into mass-produced. While potentially positive long-term, these inversions are extremely destabilizing during the transition, shattering the correlations and assumptions of our social order that relied on superseded difficulties as mechanisms of proof, filtering, sorting and signaling. For example, while few would dispute the value of the printing press or books, their introduction led to such destructive upheaval that the resulting religious wars caused proportionally more deaths than all other major wars and pandemics since combined. Historically, a new technology's revolutionary impact comes from making what's already possible and desired cheap, easy, fast, and large-scale, not from outdated or ill-fitting benchmarks that technologists tend to focus on. As such, Artificial Good-Enough Intelligence can unleash chaos and destruction long before, or if ever, AGI is reached. Existing AI is good enough to blur or pulverize our existing mechanisms of proof of accuracy, effort, veracity, authenticity, sincerity, and even humanity. The tumult from such a transition will require extensive technological, regulatory, and societal effort to counter. But the first step to getting started is having the right nightmares.
Openness is becoming as important as scale in advancing AI. This panel brings together leaders from research, industry, and infrastructure to examine the rise of open-weight frontier models and what they enable.
Discussion will focus on the challenges of training and aligning these systems, their impact on reproducibility and safety, and the new forms of collaboration they support across academia, government, and enterprise.
Attendees will gain a clear view of how open access is reshaping AI development and deployment worldwide.
The Co-X Framework: Versatile AI Agents for Automating and Augmenting Professional Workflows
Beyond monolithic models, the future of AI in industry lies in specialized agents that collaborate with human experts. This talk introduces the "Co-X" framework, a novel approach for creating a diverse ecosystem of collaborative agents tailored to specific professional domains. We will present four key agents built on this framework: the Co-AI Researcher, the Co-ML Engineer for automating software development cycles, the Co-Data Scientist for automating data analysis and insight generation, and the Co-Director for augmenting creative content generation. We will discuss the foundational technologies that enable this versatility—including long-term memory, tool use, and human-in-the-loop feedback—and demonstrate how the Co-X framework is poised to redefine productivity and innovation across industries.
Art Content Creation: When demands are met by pipelines (or not)
As creative projects grow in scale and complexity, artists increasingly rely on pipelines that span concept, production, and delivery. This need has become even more pronounced with the rise of powerful creative AI models. This panel explores how artists and technologists design holistic, end-to-end workflows that can hold both the promise and the volatility of generative systems. From enabling new ideas to managing intricate processes, each project reveals what is gained and what becomes fragile when automation, iteration, and human judgment converge. The discussion looks at the practical and philosophical lessons learned from building these pipelines and considers how they reshape creative agency and the emerging aesthetics of AI-assisted work.
Creative AI Session 2
Cosmos World Foundation Model Platform for Physical AI
Abstract: In this talk, I will introduce NVIDIA Cosmos, our World Foundation Model platform designed to advance Physical AI. Cosmos is built around three core pillars: Predict, Transfer, and Reason. I will provide updates on the latest releases—Predict 2.5 and Transfer 2.5—highlighting key improvements in generalization, efficiency, and scalability. In addition, I will share a preview of ongoing research directions that extend Cosmos toward richer world modeling and reasoning capabilities. Together, these developments aim to push the boundaries of how AI perceives, simulates, and interacts with complex real-world environments.
As Large Language Models (LLMs) become central to high-stakes applications, the reliability of their evaluation systems is under intense scrutiny, especially in the financial industry. Traditional approaches - human annotation, single LLM judges, and static model juries - struggle to balance scalability, cost, and trustworthiness. We will discuss a promising framework: LLM Jury-on-Demand, a dynamic, learning-based framework that assembles an optimal panel of LLM evaluators for each task instance, leveraging predictive modeling to select and weight judges based on context-specific reliability. Our system adapts in real time, outperforming static ensembles and single judges in alignment with human expert judgment across summarization and retrieval-augmented generation benchmarks. This talk will showcase how adaptive LLM juries can transform evaluation of AI systems, offering robust, scalable, and context-aware solutions for industry and research. Attendees will gain practical insights into building trustworthy LLM evaluation pipelines, see live demos, and discuss future directions for reliable AI assessment in critical domains.
The Role of AI in Scientific Peer Review
This social event will explore the role of Artificial Intelligence (AI) in addressing the current challenges and shaping the future of scientific peer review. We will examine how AI can be applied across the entire scholarly publishing process, from authoring to reviewing, editing, and even readership. The event will foster critical discussion on the ethical implications, potential benefits, and practical implementation of AI in this critical scientific process. Our goal is to bring together researchers, practitioners, and stakeholders from diverse fields in an interactive format to build community and explore actionable solutions for a more efficient, fair, and transparent peer review system.
Learning Theory Alliance
This social will consist of a fireside chat with an established researcher in learning theory, followed by mentorship tables.
Agents Safety Panel
As AI systems become increasingly capable and widely deployed, ensuring their safety and reliability is more important than ever. Join us for a 30-minute panel discussion on the safety of agents from development to deployment, followed by a brief Q&A session. The rest of the event will consist of discussion and mingling among attendees. We will provide drinks and snacks. This event is co-organized by the Center for AI Safety (CAIS) and UK AI Security Institute (AISI).
MusIML @ NeurIPS 2025 Social Event Guest of Honor: Professor Mubarak Shah (University of Central Florida) Special Guests: Abubakar Abid (Head of Applications @ Hugging Face) Industry Partner: Amazon
Join the social with Professor Mubarak Shah as the Guest of Honor, whose pioneering work in computer vision continues to inspire generations of ML researchers across the world, and Abubakar Abid from Hugging Face, who will join the social to connect with the next wave of AI innovators and discuss opportunities through the Fatima Fellowship.
The event will feature an exclusive 10-minute talk and an interactive booth hosted by Amazon, offering insights into industry-academia partnerships and research opportunities.
The MusIML Social Event at NeurIPS 2025 will be an evening of connection and sharing among AI/ML researchers across academia and industry. The gathering will bring together leading scholars, practitioners, and emerging researchers from the global Muslim AI/ML community. Snacks will be served.
Nonprofits Working on Openness and Trust in AI
Join us for an in-person social event at NeurIPS 2025, to explore the intersections between generative ai data and open trusted datasets. This session will feature representatives from the Wikimedia Foundation, ML Comons and AI Alliance, offering an opportunity to connect with nonprofits committed to using technology for academic and social missions. The event will begin with presentations from both organizations, highlighting their goals, projects and research (e.g., Wikipedia, AI Alliance’s Open Trusted Data Initiative, MLCommon’s Croissant data standards) and challenges with trust and responsible data usage in AI. Following the presentations, the session will transition into roundtable discussions focused on current initiatives and an open Q&A.
| NeurIPS uses cookies for essential functions only. We do not sell your personal information. Our Privacy Policy » |