Dec. 11, 2023, 3:25 p.m.

Björn Ommer

Björn Ommer is a full professor at University of Munich where he is heading the Computer Vision & Learning Group. Before he was a full professor in the department of mathematics and computer science at Heidelberg University and a co-director of its Interdisciplinary Center for Scientific Computing. He received his diploma in computer science from University of Bonn, his PhD from ETH Zurich, and he was a postdoc at UC Berkeley. Björn serves as an associate editor for IEEE T-PAMI. His research interests include semantic scene understanding and retrieval, generative AI and visual synthesis, self-supervised metric and representation learning, and explainable AI. Moreover, he is applying this basic research in interdisciplinary projects within neuroscience and the digital humanities. His group has published a series of generative approaches, including "VQGAN" and "Stable Diffusion", which are now democratizing the creation of visual content and have already opened up an abundance of new directions in research, industry, the media, and beyond.

Dec. 12, 2023, 6:30 a.m.

Conventional machine learning paradigms often rely on binary distinctions between positive and negative examples, disregarding the nuanced subjectivity that permeates real-world tasks and content. This simplistic dichotomy has served us well so far, but because it obscures the inherent diversity in human perspectives and opinions, as well as the inherent ambiguity of content and tasks, it poses limitations on model performance aligned with real-world expectations. This becomes even more critical when we study the impact and potential multifaceted risks associated with the adoption of emerging generative AI capabilities across different cultures and geographies. To address this, we argue that to achieve robust and responsible AI systems we need to shift our focus away from a single point of truth and weave in a diversity of perspectives in the data used by AI systems to ensure the trust, safety and reliability of model outputs.

In this talk, I present a number of data-centric use cases that illustrate the inherent ambiguity of content and natural diversity of human perspectives that cause unavoidable disagreement that needs to be treated as signal and not noise. This leads to a call for action to establish culturally-aware and society-centered research on impacts of data quality and data diversity for the purposes of training and evaluating ML models and fostering responsible AI deployment in diverse sociocultural contexts.

Lora Aroyo

I am a research scientist at Google Research NYC where I work on Data Excellence for AI. My team DEER (Data Excellence for Evaluating Responsibly) is part of the Responsible AI (RAI) organization. Our work is focused on developing metrics and methodologies to measure the quality of human-labeled or machine-generated data. The specific scope of this work is for gathering and evaluation of adversarial data for Safety evaluation of Generative AI systems. I received MSc in Computer Science from Sofia University, Bulgaria, and PhD from Twente University, The Netherlands. I am currently serving as a co-chair of the steering committee for the AAAI HCOMP conference series and I am a member of the DataPerf working group at MLCommons for benchmarking data-centric AI. Check out our data-centric challenge Adversarial Nibbler supported by Kaggle, Hugging Face and MLCommons. Prior to joining Google, I was a computer science professor heading the User-Centric Data Science research group at the VU University Amsterdam. Our team invented the CrowdTruth crowdsourcing method jointly with the Watson team at IBM. This method has been applied in various domains such as digital humanities, medical and online multimedia. I also guided the human-in-the-loop strategies as a Chief Scientist at a NY-based startup Tagasauris. Some of my prior community contributions include president of the User Modeling Society, program co-chair of The Web Conference 2023, member of the ACM SIGCHI conferences board. For a list of my publications, please see my profile on Google Scholar.

Dec. 12, 2023, 12:15 p.m.

The world presents massive amounts of data for learning but the data relevant to any one thing or event is sparse. I will present evidence from the egocentric experiences of infants and young children in daily lives at home that demonstrate this sparsity, focusing on the case of early visual object recognition and object name learning. I will show how the statistics of infant self-generated experiences present solutions to the problem: learner control and optimization of the input, a developmentally constrained curriculum of spatial and temporal properties of the input, and the coherence statistics of individual episodes of experience. I will present evidence with respect to both low-level visual statistics and higher-level semantic categories. I conclude with a discussion of the alliance of the neural mechanisms that generate the statistics at any point in development and the neural mechanisms do the learning. I will the implications of the findings for artificial intelligence including studies using infant egocentric experiences as training data.

Linda Smith

Linda B. Smith, Distinguished Professor at Indiana University Bloomington, is an internationally recognized leader in cognitive science and cognitive development. Taking a complex systems perspective, she seeks to understand the interdependencies among perceptual, motor and cognitive developments during the first three years of post-natal life. Using wearable sensors, including head-mounted cameras, she studies how the young learner’s own behavior creates the statistical structure of the learning environments with a current focus on developmentally changing visual statistics at the scale of everyday life and their role in motor, perceptual, and language development. The work has led to novel insights about the statistics of self-generated experiences and their role in rapid learning and innovative generalization from sparse and limited experience and challenges current massive-data approaches in AI. The work also motivates her current efforts on defining and promoting a precision (or individualized) developmental science, one that determines the multiple causes and interacting factors that create children’s individual developmental pathways. Smith received her PhD from the University of Pennsylvania in 1977 and immediately joined the faculty at Indiana University. Her work has been continuously funded by the National Science Foundation and/or the National Institutes of Health since 1978. She won the David E. Rumelhart Prize for Theoretical Contributions to Cognitive Science, the American Psychological Association Award for Distinguished Scientific Contributions, the William James Fellow Award from the American Psychological Society, the Norman Anderson Lifetime Achievement Award, and the Koffka Medal. She is an elected member of both the National Academy of Sciences and the American Academy of Arts and Science.

Dec. 13, 2023, 6:30 a.m.

'Sketches' of data are memory-compressed summarizations that still allow answering useful queries, and as a tool have found use in algorithm design, optimization, machine learning, and more. This talk will give an overview of some core sketching tools and how they work, including recent advances. We also discuss a couple newly active areas of research, such as augmenting sketching algorithms with learned oracles in a way that provides provably enhanced performance guarantees, and designing robust sketches that maintain correctness even in the face of adaptive adversaries.

Jelani Nelson

Jelani Nelson is a Professor of Electrical Engineering and Computer Sciences at UC Berkeley, and also a Research Scientist at Google (part-time). He is interested in randomized algorithms, sketching and streaming algorithms, dimensionality reduction, and differential privacy. He is a recipient of the ACM Eugene L. Lawler Award for Humanitarian Contributions within Computer Science, a Presidential Early Career Award for Scientist and Engineers (PECASE), and a Sloan Research Fellowship. He is also Founder and President of AddisCoder, Inc., a nonprofit that provides algorithms training to high school students in Ethiopia and Jamaica.

Invited Talk: Beyond Scaling Panel

Dec. 13, 2023, 12:15 p.m.

Aakanksha Chowdhery

Aakanksha led the effort on training large language models at Google Brain which led to the 540B PaLM model. Aakanksha has also been a core member of the Pathways project at Google. Prior to joining Google, Aakanksha led interdisciplinary teams at Microsoft Research and Princeton University across machine learning, distributed systems and networking. Aakanksha completed her PhD from Stanford University and was awarded the Paul Baran Marconi Young Scholar Award for outstanding scientific contributions in her doctoral thesis.

Alexander Rush

Alexander "Sasha" Rush is an Associate Professor at Cornell Tech and a researcher at Hugging Face. His research interest is in the study of language models with applications in controllable text generation, efficient inference, and applications in summarization and information extraction. In addition to research, he has written several popular open-source software projects supporting NLP research, programming for deep learning, and virtual academic conferences. His projects have received paper and demo awards at major NLP, visualization, and hardware conferences, an NSF Career Award and Sloan Fellowship. He tweets at @srush_nlp.

Angela Fan

Angela Fan is currently a research scientist at Meta AI focusing on large language models. Previously, Angela worked on machine translation for text and speech, including projects such as No Language Left Behind and Beyond English-Centric Multilingual Translation. Before that, Angela was a research engineer and did her PhD at INRIA Nancy, where she focused on text generation.

Jie Tang

Jie Tang is a WeBank Chair Professor of Computer Science at Tsinghua University. He is a Fellow of the ACM, a Fellow of AAAI, and a Fellow of IEEE. His interest is artificial general intelligence (AGI). His research received the SIGKDD Test-of-Time Award (10-year Best Paper). He also received the SIGKDD Service Award. Recently, he puts all efforts into Large Language Models (LLMs): GLM, ChatGLM, etc.

Percy Liang

Percy Liang is an Assistant Professor of Computer Science at Stanford University (B.S. from MIT, 2004; Ph.D. from UC Berkeley, 2011). His research spans machine learning and natural language processing, with the goal of developing trustworthy agents that can communicate effectively with people and improve over time through interaction. Specific topics include question answering, dialogue, program induction, interactive learning, and reliable machine learning. His awards include the IJCAI Computers and Thought Award (2016), an NSF CAREER Award (2016), a Sloan Research Fellowship (2015), and a Microsoft Research Faculty Fellowship (2014).

Dec. 14, 2023, 6:30 a.m.

I'm a simple creature. I fell in love with foundation models (FMs) because they radically improved data systems that I had been trying to build for a decade–and they are just awesome! This talk starts with my perspective about how FMs change the systems we build, focusing on what I call "death by a thousand cuts" problems. Roughly, these are problems in which each individual task looks easy, but the sheer variety and breadth of tasks make them hard.

The bulk of the talk is about understanding how to efficiently build foundation models. We describe trends in hardware accelerators from a perhaps unexpected viewpoint: database systems research. Databases have worried about optimizing IO – reads and writes within the memory hierarchy – since the 80s. In fact, optimizing IO led to Flash Attention for Transformers.

But are there more efficient architectures for foundation models than the Transformer? Maybe! I'll describe a new class of architectures based on classical signal processing, exemplified by S4. These new architectures: are asymptotically more efficient than Transformers for long sequences, have achieved state-of-the-art quality on benchmarks like long range arena, and have been applied to images, text, DNA, audio, video. S4 will allow us to make mathematically precise connections to RNNs and CNNs. I’ll also describe new twists, such as, long filters, data-dependent convolutions, and gating, that power many of these amazing recent architectures including RWKV, S5, Mega, Hyena, and RetNet, and recent work to understand their fundamental limitations to hopefully make even more awesome foundation models!

A github containing material from is under construction at Please feel free to add to it!

Christopher Ré

Christopher (Chris) Re is an associate professor in the Department of Computer Science at Stanford University. He is in the Stanford AI Lab and is affiliated with the Machine Learning Group and the Center for Research on Foundation Models. His recent work is to understand how software and hardware systems will change because of machine learning along with a continuing, petulant drive to work on math problems. Research from his group has been incorporated into scientific and humanitarian efforts, such as the fight against human trafficking, along with products from technology and companies including Apple, Google, YouTube, and more. He has also cofounded companies, including Snorkel, SambaNova, and Together, and a venture firm, called Factory.

His family still brags that he received the MacArthur Foundation Fellowship, but his closest friends are confident that it was a mistake. His research contributions have spanned database theory, database systems, and machine learning, and his work has won best paper at a premier venue in each area, respectively, at PODS 2012, SIGMOD 2014, and ICML 2016. Due to great collaborators, he received the NeurIPS 2020 test-of-time award and the PODS 2022 test-of-time award. Due to great students, he received best paper at MIDL 2022, best paper runner up at ICLR22 and ICML22, and best student-paper runner up at UAI22.

Dec. 14, 2023, 12:15 p.m.

In this talk I will discuss first solutions to some of the challenges we face in developing online RL algorithms for use in digital health interventions targeting patients struggling with health problems such as substance misuse, hypertension and bone marrow transplantation. Digital health raises a number of challenges to the RL community including different sets of actions, each set intended to impact patients over a different time scale; the need to learn both within an implementation and between implementations of the RL algorithm; noisy environments and a lack of mechanistic models. In all of these settings the online line algorithm must be stable and autonomous. Despite these challenges, RL, with careful initialization, with careful management of bias/variance tradeoff and by close collaboration with health scientists can be successful. We can make an impact!

Susan Murphy

Susan A. Murphy is Professor of Statistics and Computer Science at Harvard University. Her research focuses on improving sequential decision making in health, in particular the development of online, real-time reinforcement learning algorithms for use in personalized digital health. She is a member of the US National Academy of Sciences and of the US National Academy of Medicine. In 2013 she was awarded a MacArthur Fellowship for her work on experimental designs to inform sequential decision making. She is a Fellow of the College on Problems in Drug Dependence, Past-President of Institute of Mathematical Statistics, and a former editor of the Annals of Statistics.

Dec. 15, 2023, 6:25 a.m.

John J. Hopfield

BA Swarthmore 1954; PhD Cornell (theoretical physics) 1958. Member of technical staff Bell Laboratories 1958-1960 & 1973-1996; Faculty positions at UCBerkeley (physics) 1961-1964, Princeton Univ. (physics) 1964-1980, Caltech (chemistry and biology) 1980-1996, Princeton Univ. (molecular biology) 1997-2008, Institute for Advanced Study (2010-2013), now emeritus at Princeton Neuroscience Institute. Served as Chairman of the Faculty, Caltech; President of the American Physical Society; Executive Officer for Computation and Neural Systems, Caltech. Honors include Buckley Prize in Solid State Physics; APS prize in biophysics; Dirac Medal; Einstein Award; MacArthur Fellow; IEEE Rosenblatt Award; Swartz Prize in Computational Neuroscience. Member, National Academy of Science; American Philosophical Society. Research on the interaction of light with solids 1956-1970; biomolecular physics and kinetic proofreading 1970-1980; neural network dynamics and neurobiology 1980-.

Dec. 15, 2023, 6:30 a.m.

Shubhendu Trivedi

Dec. 15, 2023, 6:30 a.m.

Rama Vasudevan - Oak Ridge National Lab

Rama K Vasudevan

Dec. 15, 2023, 6:35 a.m.

In my talk, I will showcase how synthetic data, generated by deep generative models based on real-world data, enables solutions in healthcare that are unattainable with real data alone. I will discuss the transformation of biased datasets into unbiased ones using synthetic data. My talk will also explore how generative models facilitate transfer learning across various domains, enhancing the versatility of machine learning models. I will also cover the importance of data augmentation, where synthetic data enriches training sets for more comprehensive machine learning outcomes. Additionally, I will highlight the crucial role of synthetic data in the thorough testing and debugging of these models, ensuring their dependability in healthcare settings.

Invited talk: The Physics of Science

Dec. 15, 2023, 6:40 a.m.

What are the principles that underwrite sentient behaviour? This presentation uses the free energy principle to furnish an account in terms of active inference. First, we will try to understand sentience from the point of view of physics; in particular, the properties that self-organising systems—that distinguish themselves from their lived world—must possess. We then rehearse the same story from the point of view of a neurobiologist, trying to understand functional brain architectures. The narrative starts with a heuristic proof suggesting that life—or biological self-organization—is an inevitable and emergent property of any dynamical system that possesses a Markov blanket. This conclusion is based on the following arguments: if a system can be differentiated from its external milieu, then its internal and external states must be conditionally independent. These independencies induce a Markov blanket that separates internal and external states. Crucially, this equips internal states with an information geometry, pertaining to probabilistic beliefs about something; namely external states. This free energy is the same quantity that is optimized in Bayesian inference and machine learning (where it is known as an evidence lower bound). In short, internal states will appear to infer—and act on—their world to preserve their integrity. This leads to a Bayesian mechanics, which can be neatly summarised as self-evidencing. In the second half of the talk, we will unpack these ideas using simulations of Bayesian belief updating in the brain and relate them to predictive processing and sentient behaviour.

Key words: active inference ∙ autopoiesis ∙ cognitive ∙ dynamics ∙ free energy ∙ epistemic value ∙ self-organization.

Karl Friston

Dec. 15, 2023, 6:40 a.m.

Lewis Hammond

Acting Executive Director at Cooperative AI Foundation DPhil Candidate at University of Oxford

Interested broadly in safety in multi-agent systems, especially cooperation problems.

Dec. 15, 2023, 6:45 a.m.

Maria Chan - Argonne National Lab

Maria Chan

Maria Chan is a scientist with the Center for Nanoscale Materials at Argonne National Laboratory. She studies nanomaterials and renewable energy materials, including solar cells and batteries and other energy storage, as well as photo- and electro-catalysts, thermal transport, and thermoelectrics. Particular focus is on using machine learning for efficient computational approaches and for interfacing computational models with materials characterization (x-ray, electron, and scanning probe). She is a senior fellow at the Northwestern Argonne Institute for Science and Engineering, and a fellow of the University of Chicago Consortium for Advanced Science and Engineering. She is also an associate editor at the ACS Journal Chemistry of Materials, a member of the Condensed Matter and Materials Research Committee of the National Academies of Sciences, Engineering, and Medicine, and serves on the advisory boards for the journal APL-Machine Learning, Duke’s aiM-NRT AI training project, and CEDARS EFRC.

Dec. 15, 2023, 6:55 a.m.

Maria Rodriguez Martinez

Maria Rodriguez Martinez

Dec. 15, 2023, 6:55 a.m.

Invited talk: Is Simulation Dead?

Dec. 15, 2023, 7 a.m.

Tim Rocktäschel

Tim is a Researcher at Facebook AI Research (FAIR) London, an Associate Professor at the Centre for Artificial Intelligence in the Department of Computer Science at University College London (UCL), and a Scholar of the European Laboratory for Learning and Intelligent Systems (ELLIS). Prior to that, he was a Postdoctoral Researcher in Reinforcement Learning at the University of Oxford, a Junior Research Fellow in Computer Science at Jesus College, and a Stipendiary Lecturer in Computer Science at Hertford College. Tim obtained his Ph.D. from UCL under the supervision of Sebastian Riedel, and he was awarded a Microsoft Research Ph.D. Scholarship in 2013 and a Google Ph.D. Fellowship in 2017. His work focuses on reinforcement learning in open-ended environments that require intrinsically motivated agents capable of transferring commonsense, world and domain knowledge in order to systematically generalize to novel situations.

Dec. 15, 2023, 7 a.m.

Vijay Narasimhan - Merck KGaA, Damstadt, Germany

Vijay Narasimhan

Vijay Narasimhan is the Director of R&D Collaborations for EMD Electronics, a business of Merck KGaA, Darmstadt, Germany, one of the world's largest and most respected chemical and materials companies. In this role, Vijay initiates and drives high impact initiatives in cross-cutting areas, including deploying digital tools for sustainable materials development. Vijay has a BASc in Computer Engineering from the University of Ottawa, an MPhil in Nanotechnology Enterprise from the University of Cambridge, and a PhD in Materials Science and Engineering from Stanford University. He joined Intermolecular in 2015 as a Process and Research Engineer. Vijay has broad technical expertise, with contributions in antenna design, renewable energy, nanoscale optical materials, rural electricity access, wet chemical etchants, thin film ALD ferroelectrics and chalcogenides, and applied quantum computing.

Dec. 15, 2023, 7 a.m.

Yiming Li

Dr. Yiming Li is currently a Research Professor in the School of Cyber Science and Technology at Zhejiang University. Before that, he received his Ph.D. degree with honors in Computer Science and Technology from Tsinghua University (2023) and his B.S. degree with honors in Mathematics and Applied Mathematics from Ningbo University (2018). His research interests are in the domain of Trustworthy ML and AI Security, especially backdoor learning and copyright protection in deep learning. His research has been published in multiple top-tier conferences and journals, such as ICLR, NeurIPS, and IEEE TIFS. He served as the senior program committee member of AAAI, the program committee member of ICLR, NeurIPS, ICML, etc., and the reviewer of IEEE TPAMI, IEEE TIFS, IEEE TDSC, etc. His research has been featured by major media outlets, such as IEEE Spectrum. He was the recipient of the Best Paper Award in PAKDD 2023 and the Raising Star Award at WAIC 2023.

Dec. 15, 2023, 7 a.m.

Machine learning is increasingly being used to help tackle climate change, from optimizing electrical grids to emulating climate models and monitoring biodiversity. As such applications grow, however, it is becoming clear that high-powered ML tools often fall short. Methods designed using standard benchmarks may fail to capture the constraints or metrics of real-world problems, while a “one size fits all” approach ignores useful auxiliary information in specific applications. In this talk, we show how problem-centered design can lead to ML algorithms that are both methodologically innovative and highly impactful in the fight against climate change.

David Rolnick

Invited Talk: Invited Talk 1

Dec. 15, 2023, 7:10 a.m.

Peng Cui

Dec. 15, 2023, 7:15 a.m.

There have been many air hockey robots (Search for "air hockey robot" on I will survey ideas on how to design air hockey players, and relate them to current work on controlling robots in a variety of dynamic tasks. Two decades ago we explored manually defining primitives or skills (forehand, backhand, ...) and learning a primitive selector, first from observation, and then refining it with practice. Our view was that it is useful in learning to segment behavior into a sequence of "subroutine calls", each call having "arguments" or parameters. We chose tasks that humans do such as air hockey so we could explore learning from observation (aka learning from demonstration, imitation learning) as well as optimization-based learning approaches to learning from practice such as reinforcement learning. A key observation was that to learn from observation, the learner had to perceive in terms of primitives: segmenting behavior into individual primitives and estimating what parameters were used for each time a primitive is used. Our motivation for decomposing learning into two parts (learning skills and learning which skill to use when) is that we believed that learning a behavior selector could be very data efficient. Our approach to training the selector from observation used supervised learning, and learning from practice used model free reinforcement learning in a form that was sufficiently data efficient that all learning could be done on a physical robot rather than in simulation. One innovation we see today that was not practical 20 years ago is large scale training in simulation, transferring the learned controller to a real robot, and further learning in reality. Some current approaches to dynamic robot control pursue a conceptually similar approach of explicitly separating learning "skills" and learning to select "skills" by implicitly defining primitives by using a manually designed curriculum, and then learning a selector (in one case by distilling separate skill networks into a single network). Other approaches train on a large number of manually or automatically generated situations, and do not explicitly define a set of primitives or individual skills.

Christopher G. Atkeson

Dec. 15, 2023, 7:20 a.m.

Abstract: A defining trait of federated learning is the presence of heterogeneity, i.e., that data may differ between clients in the network. In this talk I discuss how heterogeneity affects issues of privacy and personalization in federated settings. First, I present our work on private personalized learning in cross-device settings, where we show that personalized FL provides unique benefits when enforcing client-level differential privacy in heterogeneous networks. Second, I explore cross-silo settings, where differences in privacy granularity introduce new dynamics in terms of the privacy/utility trade-offs of personalized FL. I end by discussing our application of these works to privacy-preserving pandemic forecasting in the recent UK-US privacy-enhancing technologies prize challenge, and highlight promising directions of future work on privacy and personalization in FL.

Bio: Virginia Smith is the Leonardo Assistant Professor of Machine Learning at Carnegie Mellon University. Her research spans machine learning, optimization, and distributed systems. Virginia’s current work addresses challenges related to optimization, privacy, and robustness in distributed settings to enable trustworthy federated learning at scale. Virginia’s work has been recognized by several awards, including an NSF CAREER Award, MIT TR35 Innovator Award, Intel Rising Star Award, and faculty awards from Google, Apple, and Meta. Prior to CMU, Virginia was a postdoc at Stanford University and received a Ph.D. in Computer Science from UC Berkeley.

Virginia Smith

Dec. 15, 2023, 7:20 a.m.

This talk will discuss some of the uses of generative models in healthcare, dive into continuous time generative models, and as this is a workshop, step back to high level speculations about generative modeling and the needs of generative modeling in healthcare. Along the way, I will cover two developments in continuous time generative models 1) learning the noising process in a diffusion model to maximize likelihood and 2) choosing the base distribution in flows/interpolants to facilitate learning. References to what I will cover:

Where to Diffuse, How to Diffuse, and How to Get Back: Automated Learning for Multivariate Diffusions:

Stochastic interpolants with data-dependent couplings:

On the Feasibility of Machine Learning Augmented Magnetic Resonance for Point-of-Care Identification of Disease:

Dec. 15, 2023, 7:20 a.m.

Danielle S Bassett

Prof. Bassett is the J. Peter Skirkanich Professor at the University of Pennsylvania, with appointments in the Departments of Bioengineering, Electrical & Systems Engineering, Physics & Astronomy, Neurology, and Psychiatry. Bassett is also an external professor of the Santa Fe Institute. Bassett is most well-known for blending neural and systems engineering to identify fundamental mechanisms of cognition and disease in human brain networks. Bassett is currently writing a book for MIT Press entitled Curious Minds, with co-author Perry Zurn Professor of Philosophy at American University. Bassett received a B.S. in physics from Penn State University and a Ph.D. in physics from the University of Cambridge, UK as a Churchill Scholar, and as an NIH Health Sciences Scholar. Following a postdoctoral position at UC Santa Barbara, Bassett was a Junior Research Fellow at the Sage Center for the Study of the Mind. Bassett has received multiple prestigious awards, including American Psychological Association's ‘Rising Star’ (2012), Alfred P Sloan Research Fellow (2014), MacArthur Fellow Genius Grant (2014), Early Academic Achievement Award from the IEEE Engineering in Medicine and Biology Society (2015), Harvard Higher Education Leader (2015), Office of Naval Research Young Investigator (2015), National Science Foundation CAREER (2016), Popular Science Brilliant 10 (2016), Lagrange Prize in Complex Systems Science (2017), Erdos-Renyi Prize in Network Science (2018), OHBM Young Investigator Award (2020), AIMBE College of Fellows (2020). Bassett is the author of more than 300 peer-reviewed publications, which have garnered over 24,000 citations, as well as numerous book chapters and teaching materials. Bassett is the founding director of the Penn Network Visualization Program, a combined undergraduate art internship and K-12 outreach program bridging network science and the visual arts. Bassett’s work has been supported by the National Science Foundation, the National Institutes of Health, the Army Research Office, the Army Research Laboratory, the Office of Naval Research, the Department of Defense, the Alfred P Sloan Foundation, the John D and Catherine T MacArthur Foundation, the Paul Allen Foundation, the ISI Foundation, and the Center for Curiosity.

Invited talk: Lisa Soros

Dec. 15, 2023, 7:30 a.m.

Lisa Soros

Dec. 15, 2023, 7:30 a.m.

In this talk, I will firstly introduce recent advances of backdoor defense, covering poisoned sample detection based defense at the pre-training stage, secure training based defense at the in-training stage, and backdoor mitigation based defense at the post-training stage. Then, I will introduce BackdoorBench, which is a comprehensive benchmark containing 30+ mainstream backdoor attack and defense methods, 10,000 pairs of attack-defense evaluations, as well as several interesting findings and analysis with 15+ analysis tools. The benchmark has been released at

Baoyuan Wu

Dec. 15, 2023, 7:30 a.m.

Nature has been deteriorating at rates unparalleled in human history and the implications are global. Unfortunately, we cannot value what we cannot measure. And we are failing to capture nature’s full contributions to society. In this talk, we argue that machine learning (ML) and specifically paying for forest data can play a significant role in responding to this critical call for action – but only when we develop collaborative algorithms and incentives in co-design with local and Indigenous communities that respect local ‘data’ realities. We will present our work at Gainforest, a global science-based non-profit and currently a Finalist of the $10M XPRIZE Rainforest, and how Gainforest is deploying real-world data payments on the ground in partnership with governments and conservation partners in the Global South to empower affordable top-down and bottom-up monitoring.

David Dao

David Dao is a PhD student at ETH Zurich and the founder of GainForest, a non-profit working on decentralized technology to prevent deforestation. His research focuses on the deployment of novel machine learning systems for sustainable development and ecosystem monitoring. David served as a workshop co-organizer at ICLR, ICML and NeurIPS, and is a core member at Climate Change AI, a Global Shaper at World Economic Forum and a Climate Leader at Climate Reality. He is a research intern with Microsoft and was a former researcher at UC Berkeley and Stanford University.

Invited Talk: Percy Liang

Dec. 15, 2023, 7:30 a.m.

Invited Talk: Invited Talk 2

Dec. 15, 2023, 7:35 a.m.

Kate Saenko

Kate is an AI Research Scientist at FAIR, Meta and a Full Professor of Computer Science at Boston University (currently on leave) where she leads the Computer Vision and Learning Group. Kate received a PhD in EECS from MIT and did postdoctoral training at UC Berkeley and Harvard. Her research interests are in Artificial Intelligence with a focus on out-of-distribution learning, dataset bias, domain adaptation, vision and language understanding, and other topics in deep learning.

Past academic positions

Consulting professor at the MIT-IBM Watson AI Lab 2019-2022. Assistant Professor, Computer Science Department at UMass Lowell Postdoctoral Researcher, International Computer Science Institute Visiting Scholar, UC Berkeley EECS Visiting Postdoctoral Fellow, SEAS, Harvard University

Dec. 15, 2023, 7:50 a.m.

In the current landscape where data privacy intersects with the ever-growing demand for comprehensive datasets, this talk introduces a novel approach employing large language models (LLMs) for image-based editing, targeting medical images and chart image data. This technique emphasizes preserving data integrity while ensuring the utmost privacy and confidentiality. We delve into utilizing LLMs to interpret and manipulate data visualizations, including diverse chart forms like bar graphs, pie charts, and line plots, alongside medical imagery such as X-rays, MRIs, and CT scans. The LLMs discern and subtly modify particular data elements or features within these images. In chart data, this pertains to altering specific data points without skewing the overarching trends or statistical relevance. Medical imagery involves modifying or removing identifiable markers while retaining diagnostic value.

A significant aspect of our methodology is its role in data augmentation. For chart data, we generate synthetic images mirroring real data trends and enhancing datasets while adhering to privacy norms. In the realm of medical data, we create realistic, anonymized images that expand the scope of datasets, crucial in areas plagued by data scarcity, such as rare diseases or specific medical conditions.

This talk will showcase the efficacy of our approach through various case studies and experimental analyses. We will also address the ethical implications and potential constraints of using AI in this context, providing a glimpse into the future of secure data handling and augmentation in the AI era. This presentation is an invitation to explore the intersection of AI and data privacy, specifically in medical and chart data. It is a journey through the innovative ways large language models are redefining data enhancement and privacy preservation.

Invited Talk: Ruslan Salakhutdinov

Dec. 15, 2023, 8 a.m.

Dec. 15, 2023, 8:30 a.m.

Abstract: Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML models are part of larger systems that include components for training data filtering, output monitoring, and more. In this work, we introduce privacy side channels: attacks that exploit these system-level components to extract private information at far higher rates than is otherwise possible for standalone models.

Bio: Florian Tramèr is an assistant professor of computer science at ETH Zurich. His research interests lie in Computer Security, Cryptography and Machine Learning security. In his current work, he studies the worst-case behavior of Deep Learning systems from an adversarial perspective, to understand and mitigate long-term threats to the safety and privacy of users.

Dec. 15, 2023, 8:30 a.m.

M Charity

Invited Talk: Jürgen Schmidhuber

Dec. 15, 2023, 8:30 a.m.

Dec. 15, 2023, 8:30 a.m.

Sensory prediction is vital to organisms, and humans have engineered complex neural networks to predict, but it is difficult to benchmark how well real and artificial agents predict given that ground truth is often unknown. We utilize so-called epsilon-Machines, a special type of hidden Markov model, to calibrate how well real and artificial agents predict. First we show that large random epsilon-Machines produce output that artificial agents do not predict very well, though they come close to limits set by Fano's inequality. But then, we note that newly collected data shows that neurons in a dish and humans are resource-rational predictors, meaning that they predict as well as possible given their limited memory-- an outgrowth of rate-distortion theory. This allows us insight into artificial neural networks as well, and we find that LSTMs predict as well as possible given limited memory in challenging (undersampled) conditions. Altogether, we advance the idea that epsilon-Machines can be used to benchmark the performance of predictive agents and also the idea that these agents might be only boundedly optimal at prediction because they are subject to limitations on memory.

Sarah Marzen

Invited Talk: Invited Talk

Dec. 15, 2023, 8:30 a.m.

Jonas Geiping

Jonas is a postdoctoral researcher at UMD. His background is in Mathematics, more specifically in mathematical optimization and its applications to deep learning. His current focus is on designing more secure and private ML systems, especially for federated learning, and on understanding fundamental phenomena behind generalization.

Dec. 15, 2023, 8:45 a.m.

Studying biological systems is hard, since they are the domain of microscopic processes that are typically hard to measure and observe and mired in complexity. A typical approach towards studying systems of such complexity is to perform perturbations, study their outcomes, and try to understand the links to mechanisms we may want to control better. In this talk, we will talk about a class of deep generative models [1] that is tailored to this task, in that it studies readouts of cells and disentangles latent spaces suitably to isolate perturbation effects. We will introduce the model, how it can help us perform counterfactual reasoning over cells, discuss evaluation of such models, and sketch the work ahead to apply it fruitfully in service of discovery work.

[1] Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder, Michael Bereket, Theofanis Karaletsos, NeurIPS2023

Dec. 15, 2023, 8:45 a.m.

Reinforcement learning offers an appealing formalism for autonomously acquiring robotic skills. Part of its appeal is its generality. However, practical robotic learning is not a perfect fit for the standard RL problem statement: from the obvious challenges with sample complexity and exploration to the deeper issues with lack of clearly specified reward functions and the practicality of episodic learning in a world that cannot be reset arbitrarily at will, making RL practical in robotics requires taking care to not only design algorithms that are efficient, but also accounting for the various practical aspects of the RL setup. This problem of "scaffolding" reinforcement learning itself involves numerous algorithmic challenges. In this talk, I will discuss some ways we can approach these challenges, from practical, safe, and reliable reinforcement learning that is efficient enough to run on real-world platforms, to automating reward function evaluation and resets.

Sergey Levine

Sergey Levine received a BS and MS in Computer Science from Stanford University in 2009, and a Ph.D. in Computer Science from Stanford University in 2014. He joined the faculty of the Department of Electrical Engineering and Computer Sciences at UC Berkeley in fall 2016. His work focuses on machine learning for decision making and control, with an emphasis on deep learning and reinforcement learning algorithms. Applications of his work include autonomous robots and vehicles, as well as applications in other decision-making domains. His research includes developing algorithms for end-to-end training of deep neural network policies that combine perception and control, scalable algorithms for inverse reinforcement learning, deep reinforcement learning algorithms, and more

Dec. 15, 2023, 8:50 a.m.

Currently, the most successful Deep Learning architecture is the transformer. The attention mechanism of the transformer is equivalent to modern Hopfield networks, therefore is an associative memory. However, this associative memory has disadvantages like its quadratic complexity with the sequence length when mutually associating sequences elements, its restriction to pairwise associations, its limitations in modifying the memory, its insufficient abstraction capabilities. In contrast, recurrent neural networks (RNNs) like LSTMs have linear complexity, associate sequence elements with a representation of all previous elements, can directly modify memory content, and have high abstraction capabilities. However, RNNs cannot store sequence elements that were rare in the training data, since RNNs have to learn to store. Transformer can store rare or even new sequence elements, which is one of the main reasons besides their high parallelization why they outperformed RNNs in language modelling. I think that future successful Deep Learning architectures should comprise both of these memories: attention for implementing episodic memories and RNNs for implementing short-term memories and abstraction.

Sepp Hochreiter

Dec. 15, 2023, 9:10 a.m.

I'll give an overview of how information theoretic principles have been used to motivate and advance representation learning. By combining variational bounds on information theoretic quantities like mutual information with the expressiveness and learnability of modern deep neural networks, information theory can guide the search for useful representations in a wide array of settings including unsupervised learning, supervised learning, bayesian inference and prediction. The emphasis will be on how the modern tools of deep learning can now turn the principled information theoretically motivated objectives across a broad range of interdisciplinary fields into a reality.


Dec. 15, 2023, 9:15 a.m.

Reinforcement Learning from Human Feedback (RLHF) is used to align large language models to produce helpful and harmless responses. We consider the problem of poisoning the RLHF data to embed a backdoor trigger into the model. The trigger should act like a universal "sudo" command, enabling arbitrary harmful responses at test time. Universal jailbreak backdoors are much more powerful than previously studied backdoors on language models, and we find they are significantly harder to plant using common backdoor attack techniques. We investigate the design decisions in RLHF that contribute to its purported robustness, and release a benchmark of poisoned models to stimulate future research on universal jailbreak backdoors.

Dec. 15, 2023, 9:25 a.m.

Carlo Lucibello

Dec. 15, 2023, 9:30 a.m.

Emerald Cloud Lab, a cutting-edge platform, holds immense potential in revolutionizing scientific research by seamlessly integrating artificial intelligence (AI) technologies into laboratory workflows. This talk discusses how Emerald Cloud Lab stands as an ideal platform to support AI-assisted scientific research. With its innovative features, including remote access to state-of-the-art laboratory equipment and robust data management capabilities, researchers can leverage AI algorithms to accelerate experimentation, automate data analysis, and enhance the overall research process. This transformative synergy between AI and Emerald Cloud Lab not only expedites scientific discoveries but also fosters collaboration across geographical boundaries, ultimately advancing the frontiers of knowledge in diverse fields of science.

Jason Wallace

Invited Talk: Phillip Isola

Dec. 15, 2023, 11 a.m.

Dec. 15, 2023, 11:30 a.m.

Alexander Alemi

I am a Senior Research Scientist at Google. My current focus is the intersection of Information Theory and Deep Learning. I got my Ph.D. in Theoretical Condensed Matter Physics at Cornell University, supervised by Jim Sethna. I got my B.S. at Caltech, where I majored in Physics.

Dec. 15, 2023, 11:30 a.m.

Krzysztof Choromanski

Dec. 15, 2023, 11:30 a.m.

Adam Dziedzic

Dec. 15, 2023, 11:30 a.m.

Noah Goodman

Invited Talk: Xinyun Chen

Dec. 15, 2023, 11:30 a.m.

Dec. 15, 2023, 11:40 a.m.

Title: Validation with Large Generative Models: A Need for Human-Centric Approaches

Abstract: Especially in applications such as health, we really want to know whether or not our models will behave as we want them to. And for smaller-surface models, including deep generative ones, we have a number of statistical and human-centered techniques to gain confidence that these models are doing largely reasonable things. However, these techniques, already partial for smaller-surface models, are able to provide even fewer assurances in the context of larger-surface models. In this talk, I will discuss how we must fundamentally re-think our approach to validation for larger-surface models. In particular, much of the validation effort must shift from statistical checks done in advance to human-centered checks for a particular output at task-time. I will discuss how this effort will require new methods and lay out some open questions and directions in this space.

Dec. 15, 2023, 11:45 a.m.

Melanie Mitchell

Melanie Mitchell is the Davis Professor at the Santa Fe Institute. Her current research focuses on conceptual abstraction, analogy-making, and visual recognition in artificial intelligence systems. Melanie is the author or editor of six books and numerous scholarly papers in the fields of artificial intelligence, cognitive science, and complex systems. Her latest book is Artificial Intelligence: A Guide for Thinking Humans (Farrar, Straus, and Giroux).

Invited Talk: Invited Talk

Dec. 15, 2023, noon

Invited Talk: Chelsea Finn

Dec. 15, 2023, noon

Invited Talk: Invited Talk 3

Dec. 15, 2023, 12:15 p.m.

Aditi Raghunathan

Invited Talk: Russ Tedrake

Dec. 15, 2023, 12:30 p.m.

Dec. 15, 2023, 12:30 p.m.

Remote sensing satellites capture peta-scale, multi-modal data capturing our dynamic planet across space, time, and spectrum. This rich data source holds immense potential for addressing local and planetary-scale challenges including food insecurity, poverty, climate change, and ecosystem preservation. Fully realizing this potential will require a new paradigm of machine learning approaches capable of tackling the unique character of remote sensing data. Machine learning approaches must be flexible enough to make use of the multi-modal multi-fidelity satellite data, process meter-scale observations over planetary scales, and generalize to the challenging diversity of remote sensing tasks. In this talk, I will present examples of how we are developing machine learning approaches for planetary data processing including self-supervised transformers for remote sensing data. I will also demonstrate how treating ML research and deployment as a unified approach instead of siloed steps leads to research advances that result in immediate societal impact, highlighting examples of how we are partnering directly with stakeholders to deploy our innovations in areas of critical need across the globe.

Invited Talk: Invited Talk 4

Dec. 15, 2023, 12:40 p.m.

Hoifung Poon

Hoifung Poon is Senior Director at Microsoft Health Futures. His research interests lie in advancing biomedical AI for precision health. His past work has been recognized with Best Paper Awards from premier NLP and machine learning venues such as the Conference of the North American Chapter of the Association for Computational Linguistics, the Conference of Empirical Methods in Natural Language Processing, and the Conference of Uncertainty in AI.

Dec. 15, 2023, 12:45 p.m.

Stefanos Nikolaidis

Invited talk: Feryal Behbahani

Dec. 15, 2023, 1:15 p.m.

Invited Talk: Kristen Grauman

Dec. 15, 2023, 1:30 p.m.

Invited Talk: Invited Talk 5

Dec. 15, 2023, 1:30 p.m.

Ludwig Schmidt

Dec. 15, 2023, 2:15 p.m.

In the rapidly evolving landscape of artificial intelligence, generative AI has emerged as a powerful and transformative technology with significant potential across various applications, such as medical, financial, and autonomous driving. However, with this immense potential comes the imperative to ensure the safety and trustworthiness of generative models before their large-scale deployment.

In particular, as large language models (LLMs) become increasingly prevalent in real-world applications, understanding and mitigating the risks associated with potential backdoors is paramount. This talk will delve into the critical examination of backdoors embedded in LLMs and explore their potential implications on the security and reliability of these models in different applications. Specifically, I will first talk about different strategies for injecting backdoors in LLMs and a series of CoT frameworks. I will then discuss potential defenses against known and unknown backdoors in LLM. I will provide an overview of how to assess, improve, and certify the resilience of LLMs against potential backdoors.

Bo Li

Dec. 15, 2023, 2:30 p.m.

Good benchmark performance is the first step to impact, but is only a small piece of the complex system necessary to enable computer vision models to be deployed and trusted in sustainability and conservation applications – a system that requires human and computational infrastructure, iterative development, software support and maintenance, and continual quality control. I will speak about lessons learned in deployed computer vision systems for applications in ecology, discussing differences in what is needed for end users with unequal access to resources and expertise, different priorities and risks of failure, and different operational needs from real-time decision support to post-hoc analysis.

Dec. 16, 2023, 6:20 a.m.

Learning dynamics models accurately is an important goal for Model-Based Reinforcement Learning (MBRL), but most MBRL methods learn a dense dynamics model which is vulnerable to spurious correlations and therefore generalizes poorly to unseen states. In this paper, we introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL), which first learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action, thus generalizing well to unseen states. A state abstraction can then be derived from the learned dynamics, which not only improves sample efficiency but also applies to a wider range of tasks than existing state abstraction methods. Evaluated on two simulated environments and downstream tasks, both the dynamics model and policies learned by the proposed method generalize well to unseen states and the derived state abstraction improves sample efficiency compared to learning without it.

Dec. 16, 2023, 6:25 a.m.

Dec. 16, 2023, 6:25 a.m.

I will summarize research in the area of intrinsic motivation in the context of learning and exploration and touch upon open-ended learning in the IMOL community. I will then present our recent work on combining different intrinsic motivation signals with reinforcement learning, such as learning progress, causal influence and information gain. A particular exciting direction is to employ model-based reinforcement learning to make robots learn by freely playing how to interact effectively driven by information gain and other generic drives. We find that this leads to high zero-shot generalization to new tasks.

Dec. 16, 2023, 6:30 a.m.

Invited Talk: Bertram Emil SHI (HKUST)

Dec. 16, 2023, 6:30 a.m.

Dec. 16, 2023, 6:30 a.m.

Diyi Yang

Dec. 16, 2023, 6:30 a.m.

I am currently an associate professor at HEC Montreal and Mila. Prior to that, I was a Postdoc at University of Michigan and Carnegie Mellon University. I also worked at Microsoft Research Asia as an associate researcher between 2014-2016.

Research Interests: Geometric Deep Learning, Graph Neural Networks, Knowledge Graphs, Deep Generative Models, AI for Drug Discovery (Molecule/Protein Design)

Jian Tang

Dec. 16, 2023, 6:40 a.m.

Computer Audition is changing. Since the advent of Large Audio, Language, and Multimodal Models, or generally Foundation Models, a new age has begun. Emergence of abilities in such large models by zero- or few-shot learning render it partially unnecessary to collect task-specific data and train an according model. After the last major disruption – learning representations and model architectures directly from data – this can be judged as the second major disruption in a field that once was coined by highly specialized features, approaches, and datasets shifting towards being absorbed by sheer size of models and data used for their training. In this talk, I will first argue that Computer Audition will be massively influenced by this “plate displacement” in Artificial Intelligence as a whole. I will then move towards “informed tea-leaf reading” how present and tomorrow’s Computer Audition will change in more detail. This includes prompt optimisation, fine-tuning, or synergistic combination of different foundation models and traditional approaches. Finally, I will turn towards dangers to this new glittery era – among many, the “nightshades” of audio may soon start to poison audio data. A new time has begun – it will empower Computer Audition at a whole new level while challenging us in whole new ways – let’s get ready

Bjoern Schuller

Dec. 16, 2023, 6:40 a.m.

Natural proteins evolved over billions of years, with numerous populations using chance and selection to create useful proteins. This talk will describe PRANCE, an high-throughput protein engineering method that mimics natural evolution. By combining liquid handling robotics and molecular engineering techniques, PRANCE allows us to overcome features of natural evolution that make it challenging to use as an engineering technique, like stochasticity and extinction. These advances make directed evolution more interpretable, accessible, and reproducible. We show that high-throughput directed evolution creates new engineering approaches for large-scale biological engineering challenges like genetic code expansion.

Dec. 16, 2023, 6:50 a.m.

Ellen Zhong is an Assistant Professor of Computer Science at Princeton University. She is interested in problems at the intersection of AI and biology. Her research develops machine learning methods for computational and structural biology problems with a focus on protein structure determination with cryo-electron microscopy (cryo-EM). She obtained her Ph.D. from MIT in 2022, advised by Bonnie Berger and Joey Davis, where she developed deep learning algorithms for 3D reconstruction of dynamic protein structures from cryo-EM images. She has interned at DeepMind with John Jumper and the AlphaFold team and previously worked on molecular dynamics algorithms and infrastructure for drug discovery at D. E. Shaw Research. She obtained her B.S. from the University of Virginia where she worked with Michael Shirts on computational methods for studying protein folding.

For more information about her research and group, please visit her group website:

Ellen Zhong

Dec. 16, 2023, 6:55 a.m.

Dec. 16, 2023, 7 a.m.

Morphology in evolutionary biology is used to quantify visible characteristics of specimens, a crucial aspect in addressing the biodiversity crisis. To investigate the impact of anthropogenic impacts, researchers have constructed extensive image databases. Obviously, these databases make the field optimal for the integration of machine learning. However, traditional methods used in morphometrics are grounded in diagnostic structures proposed by biologists. In contrast to that, machine learning approaches autonomously extract features without explicit biological motivation.

This talk focuses on the potential misunderstandings that can arise when applying machine learning in morphometrics. Specifically, the focus is on the biological interpretation of machine learning models, exploring instances where models demonstrate high accuracy yet struggle with coherent biological interpretation. The presentation showcases experiments that highlight the tension between excellent quantitative results but often lacks biological interpretation.

Wilfried Wöber

Wilfried Wöber graduated in 2013 with a degree in robotics. During his studies, he actively participated in research projects, where he developed perception models for autonomous trucks. Following his graduation, he continued his journey as a junior scientist at Vienna's University of Natural Resources and Life Sciences. During this phase, his research interests shifted towards the realm of data science, particularly in the context of biological processes.

In 2015, Wilfried becamelead data scientist at a startup that specialized in mobile robots and perception systems designed for agricultural applications. In 2018, he returned to the academic sphere and took on a research role at the UAS Technikum Wien focusing on robotics. In 2019, he started a doctoral program in biodiversity research, with a specific emphasis on morphology.

From 2022 onwards, Wilfried has been serving as the head of the Competence Centre for Digital Manufacturing, Automation & Robotics at the UAS Technikum Wien. Moreover, since 2022 he has taken on the role of Vice President of the IEEE ITSS Austria.

Dec. 16, 2023, 7 a.m.

The brain constructs and combines modular structures for flexible computation. I will describe recent progress in characterizing the rigid and low-dimensional nature of some of these representations, using theoretical approaches including fully unsupervised topological characterization of neural population codes. I will then discuss models of how these rigid and modular circuits can emerge, and how they can generate, with high capacity and high data-efficiency without rewiring recurrent circuitry, cognitive maps across different variables (e.g. spatial and non-spatial) as well as across varied input dimensions.

Ila Fiete

Dec. 16, 2023, 7 a.m.

Abstract: In recent years, scientific computing workloads at HPC facilities have been undergoing a significant shift. While traditionally dominated by numerical simulations, these facilities are increasingly handling AI/ML applications for training and inference, processing and producing ever-increasing amounts of scientific data. Despite the focus on optimizing the execution of new AI/HPC workflows, little attention has been paid to the I/O runtime challenges they present. This talk aims to address that gap by analyzing these emerging trends from an I/O perspective. We will explore the performane of the multilayer high-performance I/O systems under the strain of these new workflows that combine traditional HPC techniques with AI interacting in new challenging ways.

Speaker's Bio: Ana Gainaru is a computer scientist in the CSM division at Oak Ridge National Laboratory, working on data management and performance optimization for large scale scientific workflows with a focus on codes coupling traditional HPC with AI. She received her PhD from the University of Illinois at Urbana-Champaign working on fault tolerance and scheduling for large-scale systems. In her current position she is working with application developers in fusion, neutron scattering and materials sciences to deploy digital twins and large models and improve their performance at scale.

Ana Gainaru

Dec. 16, 2023, 7 a.m.

Dec. 16, 2023, 7 a.m.

Dec. 16, 2023, 7 a.m.

An increasingly common design and analysis paradigm for neural networks is thinking of them as parametrizing (implicitly or explicitly) some algorithm. In images, score-based generative models can be thought of as parametrizing a learned sampler (a stochastic differential equation or a Markov Chain). In scientific applications, PDE solvers are trained as neural analogues of numerical solvers. In language, we probe to understand whether transformers can solve simple algorithmic tasks like parsing. In this talk, I’ll share several vignettes illustrating the value of an algorithmic lens in these settings: namely, understanding the performance of “natural” algorithms will allow us to understand the performance of neural methods, as well as explore and elucidate the architectural design space.

Andrej Risteski

Assistant Professor in the ML department at CMU. Prior to that I was a Wiener Fellow at MIT, and prior to that finished my PhD at Princeton University.

Dec. 16, 2023, 7:05 a.m.

Intelligent agents must be able to learn by interacting with their environment and to adapt to changes. Continual reinforcement learning provides a natural way to model this process. In this talk, I will discuss for tackling this problem by constructing abstractions, such as intents, options, affordances and partial models that allow an agent to generalize its knowledge quickly to new circumstances.

Invited talk: Invited talk | Ron Dror

Dec. 16, 2023, 7:10 a.m.

Ron Dror is an Associate Professor of Computer Science in the Stanford Artificial Intelligence Lab. Dr. Dror leads a research group that uses molecular simulation and machine learning to elucidate biomolecular structure, dynamics, and function, and to guide the development of more effective medicines. He collaborates extensively with experimentalists in both academia and industry.

Dec. 16, 2023, 7:10 a.m.

In this talk I will discuss how by eliminating many of the layers of abstraction used in conventional computers, and working as close to the underlying physics as possible, we may be able to create special-purpose processors that are orders of magnitude faster or more energy-efficient than the present state-of-the-art.

Peter McMahon

Invited Talk: Yuki M Asano

Dec. 16, 2023, 7:15 a.m.

Dec. 16, 2023, 7:15 a.m.

Edward Choi

Edward Choi is an assistant professor in Kim Jaechul AI Graduate School of KAIST. He received his PhD in Georgia Tech in 2018 under the supervision of Dr. Jimeng Sun, focusing on interpretable deep learning methods for handling medical records. Prior to joining KAIST, Edward had engineering and research experience in ETRI, Sutter Health, DeepMind, and Google. His current research interest covers machine learning for healthcare, natural language processing and multi-modal learning.

Dec. 16, 2023, 7:15 a.m.

Attention and eye movements are thought to be a window to the human mind, and have been extensively studied across Neuroscience, Psychology and HCI. However, progress in this area has been severely limited as the underlying methodology relies on specialized hardware that is expensive (upto $30,000) and hard to scale. In this talk, I will present our recent work from Google, which shows that ML applied to smartphone selfie cameras can enable accurate gaze estimation, comparable to state-of-the-art hardware based devices, at 1/100th the cost and without any additional hardware. Via extensive experiments, we show that our smartphone gaze tech can successfully replicate key findings from prior hardware-based eye movement research in Neuroscience and Psychology, across a variety of tasks including traditional oculomotor tasks, saliency analyses on natural images and reading comprehension. We also show that smartphone gaze could enable applications in improved health/wellness, for example, as a potential digital biomarker for detecting mental fatigue. These results show that smartphone-based attention has the potential to unlock advances by scaling eye movement research, and enabling new applications for improved health, wellness and accessibility, such as gaze-based interaction for patients with ALS/stroke that cannot otherwise interact with devices.

Vidhya Navalpakkam

I am currently a Principal Scientist at Google Research. I lead an interdisciplinary team at the intersection of Machine learning, Neuroscience, Cognitive Psychology and Vision. My interests are in modeling user attention and behavior across multimodal interfaces, for improved usability and accessibility of Google products. I am also interested in applications of attention for healthcare (e.g., smartphone-based screening for health conditions).

Before joining Google in 2012, I was at Yahoo Research. Prior to joining the industry in 2010, I worked on modeling attention mechanisms in the brain during my postdoc at Caltech (working with Drs. Christof Koch, Pietro Perona and Antonio Rangel) and PhD at USC (working with Dr. Laurent Itti). I have a Bachelors in Computer Science from the Indian Institute of Technology, Kharagpur.

Dec. 16, 2023, 7:20 a.m.

Deborah Raji

Dec. 16, 2023, 7:20 a.m.

Bayesian optimization (BO) is a powerful tool for optimizing non-convex black-box (also known as derivative-free) functions that are expensive or time-consuming to evaluate and subject to noise in their evaluations. Many important problems can be formulated in this manner, such as optimizing outcomes of high-fidelity computer simulations, automated hyperparameter tuning in machine and deep learning algorithms, A/B testing for website design, policy-based reinforcement learning, and material and drug discovery. In this presentation, three key concepts are introduced, which we argue are critical for enabling and/or improving the practical performance of BO on real-world science and engineering systems. Specifically, one must: (1) leverage prior physics-based knowledge to perform highly efficient (targeted) exploration of the solution space; (2) explicitly incorporate safety constraints during interaction with physical systems to avoid unsafe, unethical, and/or undesirable outcomes; and (3) account for external sources of uncertainty during the search process to ensure the best-identified solution is robust/flexible in practice. We discuss a unified framework for adapting BO to handle these considerations and illustrate how this framework can be deployed in practice on a series of examples ranging from the design of safe cold plasma jet devices to the discovery of high-performance sustainable energy storage materials. We also offer perspectives on key challenges and future opportunities in the realm of applied BO.

Dec. 16, 2023, 7:25 a.m.

Dec. 16, 2023, 7:25 a.m.

As reinforcement learning continues to advance, the integration of efficient planning algorithms with powerful representation learning becomes crucial for solving long-horizon tasks. We address key challenges in planning, reward learning, and representation learning through the objective of learning value-based abstractions. We explore this idea via goal-conditioned reinforcement learning to learn generalizable value functions and action-free pre-training. By leveraging self-supervised reinforcement learning and efficient planning algorithms, these approaches collectively contribute to the advancement of decision-making systems capable of learning and adapting to diverse tasks in real-world environments.

Amy Zhang

Dec. 16, 2023, 7:30 a.m.

Dec. 16, 2023, 7:30 a.m.

Abstract: The training phase of Deep Neural Networks is often a very memory-intensive procedure, where large amounts of intermediate data have to be kept in memory during one iteration. One possible approach to reduce memory usage is rematerialization, aka gradient checkpointing, where some intermediate data are recomputed when needed rather than kept in memory. This provides a tradeoff between memory usage and recomputation time. In this talk I will present several approaches for the optimization problem, where one wants to minimize the recomputation time given a fixed memory budget. The corresponding algorithms have been implemented in easy-to-use libraries for the PyTorch framework, which can significantly reduce memory usage with reasonable overhead.

Speaker's Bio: Lionel Eyraud-Dubois received his PhD degree in computer science from the Université de Grenoble. He is currently a full-time researcher with Inria Bordeaux Sud-Ouest in the Topal team. His main research interests encompass combinatorial optimization and operation research techniques for scheduling and resource allocation problems in high performance computer systems, including for optimizing the training and inference processes of Deep Neural Networks.

Lionel Eyraud-Dubois

Dec. 16, 2023, 7:30 a.m.

Today’s large language models (LLMs) routinely generate coherent, grammatical and seemingly meaningful paragraphs of text. This achievement has led to speculation that LLMs have become “thinking machines”, capable of performing tasks that require reasoning and/or world knowledge. In this talk, I will introduce a distinction between formal competence—knowledge of linguistic rules and patterns—and functional competence—understanding and using language in the world. This distinction is grounded in human neuroscience, which shows that formal and functional competence recruit different cognitive mechanisms. I will show that the word-in-context prediction objective has allowed LLMs to essentially master formal linguistic competence; however, pretrained LLMs still lag behind at many aspects of functional linguistic competence, prompting engineers to adopt specialized fune-tuning techniques and/or couple an LLM with external modules. I will illustrate the formal-functional distinction using the domains of English grammar and arithmetic, respectively. I will then turn to generalized world knowledge, a domain where this distinction is much less clear-cut, and discuss our efforts to leverage both cognitive science and NLP to develop systematic ways to probe generalized world knowledge in text-based LLMs. Overall, the formal/functional competence framework clarifies the discourse around LLMs, helps develop targeted evaluations of their capabilities, and suggests ways for developing better models of real-life language use.

Anna Ivanova

Dec. 16, 2023, 7:30 a.m.

Dec. 16, 2023, 7:40 a.m.

The area of speech emotion recognition (SER) has seen significant advances with the wider availability of pre-trained models and embeddings, and the creation of larger publicly available corpora. In this talk we will touch upon some of the challenges that continue to riddle audio-based SER, such as domain adaptation, data augmentation and output generalization, and further discuss the advantages of a multi-view model approach, one that jointly learns from both categorical and dimensional affect labels.

Dimitra Emmanouilidou

Dec. 16, 2023, 7:45 a.m.

How do two-layer neural networks learn complex functions from data over time? In this talk, we shall delve into the interaction between batch size, number of iterations, and task complexity, shedding light on neural network adaptation to data features. I will particularly highlight three key findings:

i) The significant impact of a single gradient step on the feature learning, emphasizing the relationship between batch size and the target's information exponent (or complexity).

ii) The enhancement of the network's approximation ability over multiple gradient steps, enabling the learning of more intricate functions over time.

iii) The improvement in generalization compared to the basic random feature/kernel regime.

Our theoretical approach combines techniques from statistical physics, concentration of measure, projection-based conditioning, and Gaussian equivalence, which we believe holds standalone significance.

Based on joint work with Yatin Dandi, Bruno Loureiro, Luca Pesce, and Ludovic Stephan (

Florent Krzakala

Dec. 16, 2023, 7:45 a.m.

Yuzhe Yang

Invited Talk: Diane Larlus

Dec. 16, 2023, 7:45 a.m.

Invited Talk: Analog AI Accelerators

Dec. 16, 2023, 7:50 a.m.

Deep learning has irreversibly changed and drastically enhanced how we process information. The rapidly increasing computation time and energy costs required to train ever larger AI models make it evident that the future of artificial intelligence depends on realizing fast and energy-efficient processors. With the slowdown in transistor scaling and the diminishing returns expected from future CMOS, the concept of analog computing has been put forward as an alternative. Analog neural networks process information that is stored locally and in a fully-parallel manner in the analog domain using physical device properties instead of conventional Boolean arithmetic. This presentation will give an overview of analog neural network and the underlying device technologies to implement them.

Jesus del Alamo

Dec. 16, 2023, 8:05 a.m.

Invited Talk: Alyosha Efros

Dec. 16, 2023, 8:15 a.m.

Dec. 16, 2023, 8:20 a.m.

Mustafa Hajij

Dec. 16, 2023, 8:25 a.m.

Dec. 16, 2023, 8:30 a.m.

Abstract: The proliferation of large models based on Transformers has outpaced advances in hardware, resulting in an urgent need for the ability to distribute enormous models across multiple GPUs. Despite this increasing need, the absence of established best practices for selecting an optimal strategy persists, owing to the extensive expertise required in High-Performance Computing (HPC), Deep Learning (DL), and distributed systems. These challenges have motivated both AI and HPC developers to delve into pivotal questions: How can the training and inference efficiency of large models be enhanced to minimize costs? How can larger AI models be accommodated, even with limited resources? What measures can be taken to facilitate broader community access to large models and large-scale applications? In this talk, I will discuss potential solutions to these challenges by exploring hybrid parallelisms, heterogeneous memory management, and the design of user-friendly frameworks such as our open-source systemic solution: Colossal-AI (

Speaker's Bio: Yang You is a Presidential Young Professor at the National University of Singapore. He received his Ph.D. in Computer Science from UC Berkeley under Prof. James Demmel. Yang's research interests include Parallel/Distributed Algorithms, High Performance Computing, and Machine Learning. He is a winner of the IPDPS 2015 Best Paper Award (0.8%), ICPP 2018 Best Paper Award (0.3%), and ACM/IEEE George Michael HPC Fellowship. Yang is also a Siebel Scholar and a winner of the Lotfi A. Zadeh Prize. He also made the Forbes 30 Under 30 Asia list (2021) for young leaders and the IEEE-CS TCHPC early career award.

Dec. 16, 2023, 8:30 a.m.

Dec. 16, 2023, 8:30 a.m.

Recent progress in deep learning and deep reinforcement learning (DRL) has been truly remarkable, yet two problems remain: structural policy generalization and policy reuse. The first is about getting policies that generalize in a reliable way; the second is about getting policies that can be reused and combined in a flexible, goal-oriented manner. The two problems are studied in DRL but only experimentally, and the results are not clear and crisp. In our work, we have tackled these problems in a slightly different way, developing languages for expressing general policies, and methods for learning them using combinatorial and DRL approaches. We have also developed languages for expressing and learning general subgoal structures (sketches) and hierarchical polices which are based on the notion of planning width. In the talk, I'll present the main ideas and results.

This is joint work with Blai Bonet, Simon Ståhlberg, Dominik Drexler, and other members of the RLeap team.

Dec. 16, 2023, 8:35 a.m.

Dec. 16, 2023, 8:45 a.m.

Dec. 16, 2023, 8:45 a.m.

The algorithm of Equilibrium Propagation (EP) 1 is highly interesting for training physical systems as it extracts backprop-equivalent gradients directly from their convergence to a steady state 2,3. In my talk, I will show that it is an excellent starting point for building and training physical systems to perform classification tasks. I will first describe how we have used EP to train the hardware D-Wave Ising machine in a supervised way to recognize handwritten digits 4. I will then show that EP can unlock self-learning in spiking neural networks 5. Finally, I will explain how we can extend EP to unsupervised learning.

Julie Grollier

Invited talk: Audio Language Models

Dec. 16, 2023, 8:50 a.m.

Audio analysis and audio synthesis require modeling long-term, complex phenomena and have historically been tackled in an asymmetric fashion, with specific analysis models that differ from their synthesis counterpart. In this presentation, we will introduce the concept of audio language models, a recent innovation aimed at overcoming these limitations. By discretizing audio signals using a neural audio codec, we can frame both audio generation and audio comprehension as similar autoregressive sequence-to-sequence tasks, capitalizing on the well-established Transformer architecture commonly used in language modeling. This approach unlocks novel capabilities in areas such as textless speech modeling, zero-shot voice conversion, and even text-to-music generation. Furthermore, we will illustrate how the integration of analysis and synthesis within a single model enables the creation of versatile audio models capable of handling a wide range of tasks involving audio as inputs or outputs. We will conclude by highlighting the promising prospects offered by these models and discussing the key challenges that lie ahead in their development.

Neil Zeghidour

Dec. 16, 2023, 9 a.m.

The recent advent of diffusion models has led to significant progress in solving inverse problems, leveraging these models as effective generative priors. Nonetheless, challenges related to the ill- posed nature of such problems remain, often due to inherent ambiguities in measurements. Drawing inspiration from the human ability to resolve visual ambiguities through perceptual biases, here we introduce a novel latent diffusion inverse solver by incorporating regularization by texts (TReg). Specifically, TReg applies the textual description of the preconception of the solution during the reverse sampling phase, of which description is dynamically reinforced through null-text optimization for adaptive negation. Our comprehensive experimental results demonstrate that TReg successfully mitigates ambiguity in latent diffusion inverse solvers, enhancing their effectiveness and accuracy.

Dec. 16, 2023, 9 a.m.

Abstract: Deep Learning (DL) is driving unprecedented progress in a wide range of Artificial Intelligence domains, including natural language processing, vision, speech, and multimodal. However, sustaining this AI revolution requires practical solutions to the extreme demands of model scaling on the compute, memory, communication and storage components of modern computing hardware. To address this challenge, we created a deep learning optimization library called DeepSpeed to make distributed model training and inference efficient, effective, and easy on commodity hardware. This talk will focus on DeepSpeed training optimizations, particularly on ZeRO and DeepSpeed-MoE, which help to address the memory and compute requirements of extreme model scaling.

Speaker's Bio: Olatunji (Tunji) Ruwase is a co-founder and Principal Research Sciences Manager of the DeepSpeed project at Microsoft. His broad industry and research background spans compilers, operating systems, and hardware accelerators. He is currently interested in building systems and convergence optimizations, and frameworks for distributed training and inference of deep learning models. His research results on DL training, inference, and hyperparameter search are used in multiple Microsoft systems and products, such as Bing, Ads, HyperDrive, and Catapault.

Olatunji Ruwase

Invited Talk: Abhinav Gupta

Dec. 16, 2023, 9 a.m.

Dec. 16, 2023, 9 a.m.

Sequential tests and their implied confidence sequences, which are valid at arbitrary stopping times, promise flexible statistical inference and on-the-fly decision making. However, strong guarantees are limited to parametric sequential tests, which suffer high type-I error rates in practice because reality isn't parametric, or to concentration-bound-based sequences, which are overly conservative so we get wide intervals and take too long to detect effects. We consider a classic delayed-start normal-mixture sequential probability ratio test and provide the first asymptotic (in the delay) analysis under general non-parametric data generating processes. We guarantee type-I-error rates approach a user-specified α-level (primarily by leveraging a martingale strong invariance principle). Moreover, we show that the expected time-to-reject approaches the minimum possible among all α-level tests (primarily by leveraging an identity inspired by Itô's lemma). Together, our results establish these (ostensibly parametric) tests as general-purpose, non-parametric, and near-optimal. We illustrate this via numerical experiments and a retrospective re-analysis of A/B tests at Netflix.

Dec. 16, 2023, 9:05 a.m.

Sampling from a probability distribution is a ubiquitous challenge in machine learning, ranging from generative AI to approximate Bayesian inference. This talk will show how to leverage low-precision compute to accelerate Markov chain Monte Carlo (MCMC) sampling with theoretical guarantees on the convergence. First, I will introduce a general and theoretically grounded framework to enable low-precision sampling, with applications to Stochastic Gradient Langevin Dynamics and Stochastic Gradient Hamiltonian Monte Carlo. Then I will present an approach for binary sampling---operating at 1-bit precision. Finally, I will show the experimental results of low-precision sampling on various deep learning tasks.

Ruqi Zhang

Dec. 16, 2023, 9:15 a.m.

Modern theories explain children’s cognitive development mainly in term of Bayesian learning (with some innate priors in infancy). But learning cannot be the whole story or else children could learn anything at any age - which they cannot. They cannot because their capacities to experience and cognitively represent the world are structured by the human species’ evolved psychological architecture - inherited from ancient animal ancestors - and this architecture changes in significant ways over the first years of life. The main organizing principle is agency, including shared agency. The developmental proposal is that young infants (below 9 months) are goal-directed agents who cognitively represent and learn about actualities; toddlers are intentional agents who executively represent and learn also about causal, intentional, and logical possibilities; and preschoolers (over 3 years) are metacognitive agents who metacognitively represent and learn also about normative necessities. This agency-based model of cognitive development recognizes the important role of learning, but at the same time places it in the context of the overall agentive organization of children at particular developmental periods.

Dec. 16, 2023, 9:30 a.m.

Dec. 16, 2023, 9:30 a.m.

Music has a unique and complex structure which is challenging for both expert humans and existing AI systems to understand, and presents unique challenges relative to other forms of audio. We present LLARK, an instruction-tuned multimodal model for music understanding. We detail our process for dataset creation, which involves augmenting the annotations of diverse open-source music datasets and converting them to a unified instruction-tuning format. We propose a multimodal architecture for LLARK, integrating a pretrained generative model for music with a pretrained language model. In evaluations on three types of tasks (music understanding, captioning, and reasoning), we show that our model matches or outperforms existing baselines in zero-shot generalization for music understanding, and that humans show a high degree of agreement with the model’s responses in captioning and reasoning tasks.

Invited Talk: Chelsea Finn

Dec. 16, 2023, 9:30 a.m.

Dec. 16, 2023, 10:30 a.m.

Dec. 16, 2023, 11 a.m.

Dec. 16, 2023, 11:15 a.m.

In many applications, especially in the sciences, data and tasks have known invariances. Encoding such invariances directly into a machine learning model can improve learning outcomes, while it also poses challenges on efficient model design. In the first part of the talk, we will focus on the invariances relevant to eigenvectors and eigenspaces being inputs to a neural network. Such inputs are important, for instance, for graph representation learning or orthogonally equivariant learning. We will discuss targeted architectures that can universally express functions with the relevant invariances or equivariances - sign flips and changes of basis - and their theoretical and empirical benefits. Second, we will take a broader theoretical perspective. Empirically, it is known that encoding invariances into the machine learning model can reduce sample complexity. For the simplified setting of kernel ridge regression or random features, we will discuss new bounds that illustrate two ways in which invariances can reduce sample complexity. Our results hold for learning on manifolds and for invariances to a wide range of group actions.

This talk is based on joint work with Joshua Robinson, Derek Lim, Behrooz Tahmasebi, Lingxiao Zhao, Tess Smidt, Suvrit Sra and Haggai Maron.

Stefanie Jegelka

Dec. 16, 2023, 11:20 a.m.

Doris Tsao

Dec. 16, 2023, 11:20 a.m.

Dec. 16, 2023, 11:30 a.m.

Mihaela van der Schaar

Dec. 16, 2023, 11:30 a.m.

Adji Bousso Dieng

Dec. 16, 2023, 11:30 a.m.

The Marks lab is a new interdisciplinary lab dedicated to developing rigorous computational approaches to critical challenges in biomedical research, particularly on the interpretation of genetic variation and its impact on basic science and clinical medicine. To address this we develop algorithmic approaches to biological data aimed at teasing out causality from correlative observations, an approach that has been surprisingly successful to date on notoriously hard problems. In particular, we developed methods adapted from statistical physics and graphical modeling to disentangle true contacts from observed evolutionary correlations of residues in protein sequences. Remarkably, these evolutionary couplings, identified from sequence alone, supplied enough information to fold a protein sequence into 3D. The software and methods we developed is available to the biological community on a public server that is quick and easy for non-experts to use. In this evolutionary approach to accurately we have predicted the 3D structure of hundreds of proteins and large pharmaceutically relevant membrane proteins. Many of these were previously of unknown structure and had no homology to known sequences; two of the large membrane proteins have now been experimentally validated. We have now applied this approach genome wide to determine the 3D structure of all protein interactions that have sufficient sequences and can demonstrate the evolutionary signature of alternative conformations.

Debora Marks

Dec. 16, 2023, 11:30 a.m.

Temporal logics on finite traces (LTLf, LDLf, PPLTL, etc.) are increasingly attracting the interest of the scientific community. These logics are variants of temporal logics used for specifying dynamic properties in Formal Methods, but focussing on finite though unbounded traces. They are becoming popular in several areas, including AI planning for expressing temporally extended goals, reactive synthesis for automatically synthesizing interactive programs, reinforcement learning for expressing non-Markovian rewards and dynamics, and Business Process Modeling for declaratively specifying processes. These logics can express general safety and guarantee (reachability) properties, though they cannot talk about the behaviors at the infinitum as more traditional temporal logics on infinite traces. The key characteristic of these logics is that they can be reduced to equivalent regular automata, and in turn, automata, once determinized, into two-player games on graphs. This gives them unprecedented computational effectiveness and scalability. In this talk, we will look at these logics, their corresponding automata, and resulting games, and show their relevance in service composition. In particular, we show how they can be used for automatically synthesizing orchestrators for advanced forms of goal-oriented synthesis.

Giuseppe De Giacomo

Dec. 16, 2023, 11:30 a.m.

Accurately predicting human egocentric gaze events, such as saccades, fixations, and blinks, holds transformative potential for virtual reality (VR) applications. By using eye-trackers integrated into wearable head-mounted displays, it is possible to optimize runtime, improve user experience, and integrate gaze into downstream tasks. However, predicting gaze events remains challenging as the temporal dynamics of egocentric gaze events are multifaceted, for example, influenced by visual stimuli and task demands. While deep learning and machine learning offer promising avenues, the investigation of these approaches in gaze event prediction remains largely uncharted. In this talk, we present recent advances in recurrent time-to-event analysis for gaze event prediction, addressing the pressing challenges of temporal modeling and real-time prediction. This talk will discuss the potential implications, challenges, and techniques of accurate gaze event prediction in the context of egocentric vision with a focus on VR.

Dec. 16, 2023, 11:30 a.m.

Dec. 16, 2023, 11:45 a.m.

Foundation models when trained at scale have shown impressive capabilities to adapt to new tasks with few examples provided in context; however, there remains a gap between the ability of these models and requirements to successfully act in embodied domains. To close this gap with reinforcement learning, our agents have to be trained at scale as well. In this talk, I will present recipes towards this end and dive into the details of how we trained AdA, utilizing a vast open ended task space, to achieve human-timescale adaptation in a 3d embodied domain. The trained agent displays on-the-fly hypothesis-driven exploration, efficient exploitation of acquired knowledge, and can successfully be prompted with first-person demonstrations.

Dec. 16, 2023, 11:50 a.m.

Dr. Ji is currently a Professor and Presidential Impact Fellow in the Department of Computer Science & Engineering, Texas A&M University, directing the Data Integration, Visualization, and Exploration (DIVE) Laboratory. His research interests are machine learning and AI for science (including AI for quantum, atomistic, and continuum systems). Dr. Ji received the National Science Foundation CAREER Award in 2014. Currently, he serves as an Associate Editor for IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), ACM Transactions on Knowledge Discovery from Data (TKDD), and ACM Computing Surveys (CSUR). He regularly serves as an Area Chair or equivalent roles for AAAI Conference on Artificial Intelligence (AAAI), International Conference on Learning Representations (ICLR), International Conference on Machine Learning (ICML), International Joint Conference on Artificial Intelligence (IJCAI), ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), and Annual Conference on Neural Information Processing Systems (NeurIPS).

Shuiwang Ji

Dec. 16, 2023, 11:50 a.m.

More than a dozen excitatory cell types have been identified in the mouse primary visual cortex (V1) based on transcriptomic, morphological and in vitro electrophysiological features. However, little is known about the functional organization of visual cortex neurons and their responses properties beyond orientation selectivity. Here, we combined large-scale two-photon imaging and predictive modeling of neural responses to study the functional organization of mouse V1. We developed a rotation-equivariant model architecture, followed by a rotation-invariant clustering pipeline to map the landscape of neural function in V1. Clustering neurons based on their stimulus response function revealed a continuum with around 30 modes. Each mode represented a group of neurons that exhibited a specific combination of stimulus selectivity and nonlinear response properties such as cross-orientation inhibition, size-contrast tuning and surround suppression. Interestingly, these non-linear properties were expressed independently and all possible combinations were present in the population. Our study shows how building known symmetries into neural response models can reveal interesting insights about the organization of the visual system.

Dec. 16, 2023, noon

Abstract: Large models are shifting “what’s possible” with AI. Brute-force scaling of model parameter count increases model capacity, and when presented with enough training data, has shown remarkable results. However, the advantages of large-scale models come at a price of steep increase in system complexity and infrastructure cost. Training and serving these models is an engineering challenge and is very expensive. Even minor errors in model design or training procedure can result in significant waste of resources. At Cerebras we have trained our share of large language models and learned along the way how to train these models efficiently to get “the biggest bang for the buck”. In this talk we will share our experience and insights from training various LLMs. In addition to techniques for compute efficient training of dense models, we will look into benefits of sparse training and inference on Cerebras hardware, designed to take full advantage of all types of sparsity.

Speaker's Bio: Natalia Vassilieva is a Sr. Director of Product at Cerebras Systems, a computer systems company dedicated to accelerating deep learning. She leads the vision and strategy for Cerebras products, market, application, and algorithm analysis for machine learning use cases. Her focus is machine learning and artificial intelligence, analytics, and application-driven software-hardware optimization and co-design. Prior to joining Cerebras, Natalia was a Sr. Research Manager at Hewlett Packard Labs, where she led the Software and AI group and served as the head of HP Labs Russia from 2011 until 2015. Prior to Hewlett Packard, she was an Associate Professor at St. Petersburg State University in Russia and worked as a software engineer for several IT companies. Natalia holds a Ph.D. in computer science from St. Petersburg State University.

Natalia Vassilieva

Dec. 16, 2023, noon

Active learning consists in sequentially and adaptively constructing a data-set in the hope of improving the learning speed by avoiding useless data-points where the current model is already correct with large probability and by focusing on uncertainty regions. During this talk, I will give a short reminder on the potential benefits and pitfalls of active learning, especially in large and combinatorial models.

Dec. 16, 2023, noon

Dec. 16, 2023, 12:10 p.m.

The recent rise of Generative AI has led to a dramatic increase in the sizes and computational needs for AI models. This compute explosion has raised serious cost and sustainability concerns in both the training & deployment phases of these large models. Low-precision techniques, that lower the precision of the weights, activations, and gradients, have been successfully employed to reduce the training-precision from 32-bits down to 8-bits (FP8) and the inference-precision down to 4-bits (INT4). These advances have enabled more than a 10-fold improvement in compute efficiency over the past decade – however, it is expected that further gains may be limited. Recent developments in analog computational techniques offer the promise of achieving an additional 10-100X enhancement in crucial metrics, including energy efficiency and computational density. In this presentation, we will provide an overview of these significant recent breakthroughs, which are likely to play a pivotal role in advancing Generative AI and making it more sustainable and accessible to a wider audience.

Kailash Gopalakrishnan

Dec. 16, 2023, 12:10 p.m.

Professor Anandkumar's research interests are in the areas of large-scale machine learning, non-convex optimization and high-dimensional statistics. In particular, she has been spearheading the development and analysis of tensor algorithms for machine learning. Tensor decomposition methods are embarrassingly parallel and scalable to enormous datasets. They are guaranteed to converge to the global optimum and yield consistent estimates for many probabilistic models such as topic models, community models, and hidden Markov models. More generally, Professor Anandkumar has been investigating efficient techniques to speed up non-convex optimization such as escaping saddle points efficiently.

Anima Anandkumar

Anima Anandkumar is a Bren professor at Caltech. Her research spans both theoretical and practical aspects of large-scale machine learning. In particular, she has spearheaded research in neural operators, tensor-algebraic methods, non-convex optimization, probabilistic models and deep learning.

Anima is the recipient of several awards and honors such as the Bren named chair professorship at Caltech, Alfred. P. Sloan Fellowship, Young investigator awards from the Air Force and Army research offices, Faculty fellowships from Microsoft, Google and Adobe, and several best paper awards.

Anima received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She was a postdoctoral researcher at MIT from 2009 to 2010, a visiting researcher at Microsoft Research New England in 2012 and 2014, an assistant professor at U.C. Irvine between 2010 and 2016, an associate professor at U.C. Irvine between 2016 and 2017 and a principal scientist at Amazon Web Services between 2016 and 2018.

Dec. 16, 2023, 12:15 p.m.

Domain adaptation, transfer, multitask, meta, few-shots, or lifelong learning … these are all important recent directions in ML that all touch at the core of what we might mean by ‘AI’. As these directions all concern learning in heterogeneous and ever-changing environments, they all share a central question: what information a 'source' distribution may have about a 'target' distribution, or put differently, which measures of discrepancy between distributions properly model such information.

Our understanding of this central question is still rather fledgeling, with both positive and negative results. On one hand we show that traditional notions of distance and divergence between distributions (e.g., Wasserstein, TV, KL, Renyi) are in fact too conservative: a source may be 'far' from a target under such traditional notions, yet still admit much useful information about the target distribution. We then turn to the existence of 'adaptive' procedures, i.e., procedures which can optimally leverage such information in the source data without any prior distributional knowledge. Here the picture is quite nuanced: while various existing approaches turn out to be adaptive in usual settings with a single source and hypothesis class, no procedure can guarantee optimal rates adaptively in more general settings, e.g., settings with multiple source datasets (as in multitask learning), or settings with multiple hypothesis classes (as in model selection or hyper-parameter tuning).

Such negative results raise new questions, as they suggest that domain adaptation and related problems may benefit from more structure in practice than captured by current formalisms.

The talk is based on joint work with collaborators over the last few years, namely, G. Martinet, S. Hanneke, J. Suk, Y. Mahdaviyeh, N. Galbraith.

Samory Kpotufe

Invited Talk: Xinyun Chen

Dec. 16, 2023, 12:15 p.m.

Dec. 16, 2023, 12:15 p.m.

Dec. 16, 2023, 12:25 p.m.

In the last few years, various forms of information flow were found to be useful quantities for the characterization of the decision-making of agents, whether natural or artificial. We here especially consider one particular type of information flow, empowerment, which can be used as intrinsic motivation that is derived from the dynamical properties of the external perception-action loop. The present talk will discuss empowerment in the context of its evolutionary motivation, questions of agency as well as some insightful new links to Dynamical Systems theory.

Dec. 16, 2023, 12:30 p.m.

Good neural architectures are rooted in good inductive biases (a.k.a. priors). Equivariance under symmetries is a prime example of a successful physics inspired prior which sometimes dramatically reduces the number of examples needed to learn predictive models. Diffusion based models, one of the most successful generative models, are rooted in nonequilibrium statistical mechanics. Conversely, ML methods have recently been used to solve PDEs for example in weather prediction, and to accelerate MD simulations by learning the (quantum mechanical) interactions between atoms and electrons.

In this work we will try to extend this thinking to more flexible priors in the hidden variables of a neural network. In particular, we will impose wavelike dynamics in hidden variables under transformations of the inputs, which relaxes the stricter notion of equivariance. We find that under certain conditions, wavelike dynamics naturally arises in these hidden representations. We formalize this idea in a VAE-over-time architecture where the hidden dynamics is described by a Fokker-Planck (a.k.a. drift-diffusion) equation. This in turn leads to a new definition of a disentangled hidden representation of input states that can easily be manipulated to undergo transformations.

Max Welling

Dec. 16, 2023, 12:30 p.m.

Dec. 16, 2023, 12:30 p.m.

Smita Krishnaswamy is an Associate professor in Genetics and Computer Science. She is affiliated with the applied math program, computational biology program, Yale Center for Biomedical Data Science and Yale Cancer Center. Her lab works on the development of machine learning techniques to analyze high dimensional high throughput biomedical data. Her focus is on unsupervised machine learning methods, specifically manifold learning and deep learning techniques for detecting structure and patterns in data. She has developed algorithms for non-linear dimensionality reduction and visualization, learning data geometry, denoising, imputation, inference of multi-granular structure, and inference of feature networks from big data. Her group has applied these techniques to many data types such as single cell RNA-sequencing, mass cytometry, electronic health record, and connectomic data from a variety of systems. Specific application areas include immunology, immunotherapy, cancer, neuroscience, developmental biology and health outcomes. Smita has a Ph.D. in Computer Science and Engineering from the University of Michigan.

Dec. 16, 2023, 12:35 p.m.

Dec. 16, 2023, 1 p.m.

Invited Talk: Invited Talk (Su-In Lee)

Dec. 16, 2023, 1:05 p.m.

Dec. 16, 2023, 1:30 p.m.

Real-world experimentation often involves making complex tradeoffs between many outcomes or targeting noisy or long-term objectives that are difficult to measure. This problem can be particularly challenging in the context of continuous action spaces where an infinite number of tradeoffs can be achieved. I will discuss these problems in the context of real-world problems faced by experimenters at Meta. I will discuss solutions to the many objective problem via learning from human feedback. I will conclude with strategies for targeting long-term outcomes and open questions.

Dec. 16, 2023, 1:30 p.m.

Abstract: Training and inference of large transformer models is one of the most important computational challenges of modern AI. Systems for training these models must be highly scalable and run at extreme efficiency, because the amount of work necessary to converge a model can be extraordinarily large. Inference needs to be fast and accommodate different query sizes. In this talk, I'll discuss the work we have been doing at NVIDIA to optimize systems for Large Language Model training and inference on GPUs. I will present different parallelism techniques we are using in our LLM framework Megatron-LM and will discuss how parallelism techniques can be combined to maximize the training throughput of large models while retaining strict optimizer semantics. I will discuss optimizations techniques for inference and methods to accelerate inference and reduce memory fragmentation.

Speaker's Bio: Dr. Mohammad Shoeybi is the Director of Applied Research at NVIDIA. His team focuses on building large foundational models and improving them to downstream applications. His team has build Megatron-LM, a framework for efficiently training LLMs and used it to train several large scale models such as Megatron-Turing NLG with 530 billions of parameters. He received his PhD. from Stanford University in 2010. Prior to NVIDIA, he worked at DeepMind and Baidu USA leading efforts on bringing deep learning and reinforcement learning to applications.

Bryan Catanzaro

Mohammad Shoeybi

Dec. 16, 2023, 1:30 p.m.

David Krueger

Dec. 16, 2023, 1:30 p.m.

Michael Bernstein

Dec. 16, 2023, 1:30 p.m.

Differentiable digital signal processing (DDSP) allows us to constrain the outputs of a neural network to those of a known class of signal processor. This can help us train with limited data, reduce audio artefacts, infer parameters of signal models, and expose human interpretable controls. However, numerous failure modes still exist for certain important families of signal processor. This talk illustrates two such challenges, frequency parameter non-convexity and permutation symmetry, and introduces promising approaches to solving them.

Ben Hayes

Dec. 16, 2023, 1:35 p.m.

Invited Talk: Universal Jailbreaks

Dec. 16, 2023, 2 p.m.

Dec. 16, 2023, 2 p.m.

Much of social science is centered around terms like “ideology” or “power”, which generally elude precise definition, and whose contextual meanings are trapped in surrounding language. This talk explores the use of large language models (LLMs) to flexibly navigate the conceptual clutter inherent to social scientific measurement tasks. We rely on LLMs’ remarkable linguistic fluency to elicit ideological scales of both legislators and text, which accord closely to established methods and our own judgement. A key aspect of our approach is that we elicit such scores directly, instructing the LLM to furnish numeric scores itself. This approach is methodologically "dumb" and shouldn't "work" according to classical principles of measurement. We nevertheless find surprisingly compelling results, which we showcase through a variety of different case studies.

Dec. 16, 2023, 2 p.m.

Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning. Recently, transformers have been shown to learn new language or vision tasks without any weight updates from only a few examples, also referred to as in-context learning. However, the sequential decision-making setting poses additional challenges having a lower tolerance for errors since the environment's stochasticity or the agent's actions can lead to unseen, and sometimes unrecoverable, states. In this talk, I will show that naively applying transformers to this setting does not enable in-context learning of new tasks. I will then show how different design choices such as the model size, data diversity, environment stochasticity, and trajectory burstiness, affect in-context learning of sequential decision-making tasks. Finally, I will show that by training on large diverse offline datasets, transformers are able to learn entirely new tasks with unseen states, actions, dynamics, and rewards, using only a handful of demonstrations and no weight updates. I will end my talk with a discussion of the limitations of offline learning approaches in sequential decision-making and some directions for future work.

Roberta Raileanu

Dec. 16, 2023, 2 p.m.

Dec. 16, 2023, 2:05 p.m.

What is curiosity? Across disciplines, some scholars offer a range of definitions while others eschew definitions altogether. Is the field of curiosity studies simply too young? Should we, as has been argued in neuroscience, press forward in definition-less characterization? At this juncture in the field's history, we turn to an examination of curiosity styles, and ask: How has curiosity been practiced over the last two millennia and how is it practiced today? We exercise a recent historico-philosophical account to catalogue common styles of curiosity and test for their presence as humans browse Wikipedia. Next we consider leading theories from psychology and mechanics that could explain curiosity styles, and formalize those theories in the mathematics of network science. Such a formalization allows theories of curiosity to be explicitly tested in human behavioral data and for their relative mental affordances to be investigated. Moreover, the formalization allows us to train artificial agents to build in human-like curiosity styles through reinforcement learning. Finally, with styles and theories in hand, we expand out to a study of several million users of Wikipedia to understand how curiosity styles might or might not differ around the world and track factors of social inequality. Collectively, our findings support the notion that curiosity is practiced---differently across people---as unique styles of network building, thereby providing a connective counterpoint to the common acquisitional account of curiosity in humans.

Dec. 16, 2023, 2:10 p.m.

Reinforcement learning (RL) is famously powerful but difficult to wield, and until recently, had demonstrated impressive results on games, but little real world impact. I will start the talk with a discussion of RL for Large Language Models (LLMs), including scalable supervision techniques to better align models with human preferences (Constitutional AI / RLAIF). Next, I will discuss RL for chip floorplanning, one of the first examples of RL solving a real world engineering problem. This learning-based method can generate placements that are superhuman or comparable on modern accelerator chips in a matter of hours, whereas the strongest baselines require human experts in the loop and can take several weeks. This method was published in Nature and used in production to generate superhuman chip layouts for the last four generations of Google’s flagship AI accelerator (TPU), including the recently announced TPU v5p.

Dec. 16, 2023, 2:45 p.m.

Humans have developed technological repertoires that have enabled us to survive in virtually every habitat on Earth. However, it can be difficult to trace how these technologies came to be—folk histories of technological achievement often highlight a few brilliant individuals, while losing sight of the rest of the community’s contributions. In this talk, I will present work analyzing player behavior in One Hour One Life, a multiplayer online game where players can build technologically complex communities over many generations (N = 22,011 players, 2,700 communities, 428,255 lives lived, 127,768,267 social interactions detected). This dataset provides a unique opportunity to test how community dynamics shape technological development in an open-ended world: Players can form communities that endure for many generations, and they can combine thousands of unique materials to build vast technological repertoires. At a macroscopic level, we find that community characteristics—such as population size, interconnectedness, and specialization—predict the size and stability of a community’s technological repertoire. Zooming in, we find that individual players contribute their own, individual expertise to technological development—participants consistently perform similar jobs in different communities that they’re placed in, and acquire expertise in these jobs through social interaction. Our work tests theories of cultural evolution and economic complexity at scale and provides a methodological basis to study the interplay between individual expertise and community structures.

Dec. 16, 2023, 2:50 p.m.

Owain Evans

Dec. 16, 2023, 3 p.m.

Speech enhancement technology has made remarkable progress in recent years. While many single channel methods have been proposed, and their performance has improved, multi-channel speech enhancement technology remains important due to its high performance in estimating and retaining sound source spatial information. Many multi-channel processing methods have been proposed so far for cases where the sound source and noise positions are fixed. However, for real-world applications, it is necessary to consider sound source movement and improve robustness to moving sources. In this presentation, I will introduce multi-channel audio enhancement technologies for moving sources. First, I will present an extension of mask-based neural beamforming, which is widely used as an ASR front-end, to moving sound sources. This extension is achieved by integrating model-based array signal processing and data-driven deep learning approaches. Then, I will discuss model-based, unsupervised multi-channel source separation and extraction approaches, e.g., independent component/vector analysis (ICA/IVA). For multi-channel processing, in addition to dealing with moving sources, it is also essential to devise techniques that limit the increase in computational complexity as the number of microphones increases. To address this issue, I will introduce a fast online IVA algorithm for tracking a single moving source that achieves optimal time complexity and operates significantly faster than conventional approaches.

Shoko Araki

Shoko Araki is a Senior Research Scientist at NTT Communication Science Laboratories, NTT Corporation, Japan where she is currently leading the Signal Processing Research Group. Since joining NTT in 2000, she has been engaged in research on acoustic signal processing, microphone array signal processing, blind speech separation, meeting diarization, and auditory scene analysis. She was formerly a member of the IEEE SPS Audio and Acoustic Signal Processing Technical Committee (AASP-TC) (2014-2019) and currently serves as its Chair. She was a board member of the Acoustical Society of Japan (ASJ) (2017-2020), and she served as vice president of ASJ (2021-2022). She is an IEEE Fellow.