Dec. 7, 2021, 7 a.m.

Duolingo is the most popular way to learn languages in the world. With over half a billion exercises completed every day, we have the largest dataset of people learning languages ever amassed. In this talk I will describe all the different ways in which we use AI to improve how well we teach and how to keep our learners engaged.

Luis von Ahn

Luis von Ahn is an entrepreneur and former computer science professor at Carnegie Mellon University who is considered one of the pioneers of crowdsourcing. He is known for co-inventing CAPTCHAs, being a MacArthur Fellow, and selling two companies to Google in his 20s.

He is currently the co-founder and CEO of Duolingo (NASDAQ: DUOL), a language-learning platform created to bring free language education to the world. With over 500 million users, it is now the most popular way to learn languages and the most downloaded education app in the world.

Luis has been named one of the 10 Most Brilliant Scientists by Popular Science Magazine, one of the 50 Best Brains in Science by Discover, one of the Top Young Innovators Under 35 by MIT Technology Review, one of the 100 Most Innovative People in Business by Fast Company Magazine, and in 2018 won the Lemelson-MIT Prize.

Dec. 7, 2021, 3 p.m.

If data is power, this keynote asks what methodologies and frameworks, beyond measuring bias and fairness in ML, might best serve communities that are, otherwise, written off as inevitable ‘data gaps?’ To address this question, the talk applies design justice principles articulated in 2020 by scholar Costanza-Chock to the case of community-based organizations (CBOs) serving marginalized Black and Latinx communities in North Carolina. These CBOs, part of an 8-month study of community healthcare work, have become pivotal conduits for COVID-19 health information and equitable vaccine access. As such, they create and collect the so-called ‘sparse data’ of marginalized groups often missing from healthcare analyses. How might health equity—a cornerstone of social justice—be better served by equipping CBOs to collect community-level data and set the agendas for what to share and learn from the people that they serve? The talk will open with an analysis of the limits of ML models that prioritize the efficiencies of scale over attention to just and inclusive sampling. It will then examine how undertheorized investments in measuring bias and fairness in data and decision-making systems distract us from considering the value of collecting data with rather than for communities. Outlining an early learning theory proposed by Russian psychologist Lev Vygotsky (1978), the presentation will argue that focusing on the demands of collecting community members’ data and observing the social interactions that are computationally hard to measure but qualitatively invaluable to see are necessary to advance socially-just ML. The talk will conclude with recommendations for how to reorient computer science and machine learning to a more explicit theory and practice of data power-sharing.

Mary L. Gray

Mary L. Gray is Senior Principal Researcher at Microsoft Research and Faculty Associate at Harvard University’s Berkman Klein Center for Internet and Society. She maintains a faculty position in the Luddy School of Informatics, Computing, and Engineering with affiliations in Anthropology and Gender Studies at Indiana University. Mary, an anthropologist and media scholar by training, focuses on how people’s everyday uses of technologies transform labor, identity, and human rights. Mary earned her PhD in Communication from the University of California at San Diego in 2004, under the direction of Susan Leigh Star. In 2020, Mary was named a MacArthur Fellow for her contributions to anthropology and the study of technology, digital economies, and society.

Mary’s work includes In Your Face: Stories from the Lives of Queer Youth (1999) and Out in the Country: Youth, Media, and Queer Visibility in Rural America (2009), which looked at how young people in rural Southeast Appalachia use media to negotiate identity, local belonging, and connections to broader, imagined queer communities. The book won the American Anthropological Association’s Ruth Benedict Prize and the American Sociological Association’s Sexualities Studies Book Award in 2009. And, with Colin Johnson and Brian Gilley, Mary co-edited Queering the Countryside: New Directions in Rural Queer Studies (2016), a 2016 Choice Academic Title.

In 2019, Mary co-authored (with computer scientist Siddharth Suri), Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass. The book chronicles workers’ experiences of on-demand information service jobs—from content moderation and data-labeling to telehealth—work that is essential to the global growth of artificial intelligence and platform economies more broadly. It was named a Financial Times’ Critic’s Pick and awarded the McGannon Center for Communication Research Book Prize in 2019. The book was also awarded the 2020 Communication, Information Technologies, and Media Sociology section of the American Sociological Association (CITAMS) Book Award Honorable Mention. The book has been translated into Korean and Chinese.

Mary chairs the Microsoft Research Ethics Review Program—the only federally-registered institutional review board of its kind in Tech. She is recognized as a leading expert in the emerging field of AI and ethics, particularly research at the intersections of computer and social sciences. She sits on the editorial boards of Cultural Anthropology, Television and New Media, the International Journal of Communication, and Social Media + Society. Mary’s research has been covered by popular press venues, including The Guardian, El Pais, The New York Times, The Los Angeles Times, Nature, The Economist, Harvard Business Review, The Chronicle of Higher Education, and Forbes Magazine. She served on the Executive Board of the American Anthropological Association and was the Association’s Section Assembly Convenor from 2006-2010 as well as the co-chair of the Association’s 113th Annual Meeting. Mary currently sits on several boards, including the California Governor’s Council of Economic Advisors, Public Responsibility in Medicine and Research (PRIM&R), and Stanford University’s One-Hundred-Year Study on Artificial Intelligence (AI100) Standing Committee, commissioned to reflect on the future of AI and recommend directions for its policy implications.

Dec. 8, 2021, 11 p.m.

At the end of the 18th century, Gaspard Monge introduced the optimal transport problem to understand the most efficient way of transporting a distribution of material from one place to another to build fortifications. In the last 30 years, this theory has found various applications in many areas of mathematics. However, more recently, optimal transport has also become a very powerful tool in many areas of machine learning. In this talk, we will give an overview of optimal transport, with some selected applications.

Alessio Figalli

Alessio Figalli earned his doctorate in 2007 under the supervision of Luigi Ambrosio at the Scuola Normale Superiore di Pisa and Cédric Villani at the École Normale Supérieure de Lyon. He was a faculty at the University of Texas-Austin, before moving to ETH Zürich in 2016 as a chaired professor. Since 2019 he is the director of the “FIM-Institute for Mathematical Research” at ETH Zürich. In 2018 he won the Fields Medal for “his contributions to the theory of optimal transport, and its application to partial differential equations, metric geometry, and probability”.

Dec. 9, 2021, 11:15 a.m.

Raquel Urtasun

Dec. 9, 2021, 11:35 a.m.

Dec. 9, 2021, 12:25 p.m.

Dec. 9, 2021, 12:45 p.m.

Douwe Kiela

Dec. 9, 2021, 3 p.m.

In October 2021, X officially became an option for gender on US passports. What are the computational changes necessary to adapt to this more inclusive gender option? In this talk, Meredith Broussard investigates why large-scale computer systems are stuck using 1950s ideas about gender, and what is needed to update sociotechnical systems. She explores how allies can leverage public interest technology in order to think beyond the gender binary, interrogate and audit software systems, and create code for social good.

Meredith Broussard

Dec. 10, 2021, 7 a.m.

In nature, groups of thousands of individuals cooperate to create complex structure purely through local interactions – from cells that form complex organisms, to social insects like termites that build meter-high mounds and army ants that self-assemble entire nests, to the complex and mesmerizing motion of fish schools and bird flocks. What makes these systems so fascinating to scientists and engineers alike, is that even though each individual has limited ability, as a collective they achieve tremendous complexity.

What would it take to create our own artificial collectives of the scale and complexity that nature achieves? My lab investigates this question by using inspiration from biological collectives to create robotic systems, e.g. the Kilobot thousand robot swarm inspired by cells, and the Termes robots inspired by mound-building termites. In this talk, I will discuss a recent project in my group – Eciton robotica - to create a self-assembling swarm of soft climbing robots inspired by the living architectures of army ants. Our work spans soft robotics, new theoretical models of self-organized self-assembly, and new field experiments in biology. Most critically, our work derives from the collective intelligence of engineers and scientists working together.

Radhika Nagpal

Radhika Nagpal is currently the Kavli Professor of Computer Science at Harvard University and a founding faculty member of the Wyss Institute for Biologically Inspired Engineering. Starting January 2022, she will be moving to Princeton University to lead new robotics initiatives. Nagpal leads the Self-organizing Systems Research Group (SSR) and her research interests span computer science, robotics, and biology. Nagpal was chosen by the journal Nature as one of the top ten influential scientists and engineers of the year (Nature 10 award, Dec 2014). Other awards include the Microsoft New Faculty Fellowship (2005), NSF Career Award (2007), Borg Early Career Award (2010), Radcliffe Fellowship (2012), the McDonald Mentoring Award (2015), AAAI and ACM Fellow (2020), and being an invited TED speaker in 2017. Nagpal is the co-founder of ROOT Robotics, an educational robotics company aimed at democratizing AI and robotics through early education; her lab's Kilobots have been commercialized with over 8000 robots sold worldwide. Nagpal is also the author of a Scientific American blog article on tenure-track life ("the Awesomest 7-year Postdoc", 2013), and is dedicated to creating a diverse and inclusive culture in STEM and academia.


Dec. 13, 2021, 1:12 a.m.

By leveraging principles of health equity, I will discuss the use of causal models and machine learning to address realistic challenges of data collection and model use across environments. Examples include a domain adaptation approach that improves prediction in under-represented population sub-groups by leveraging invariant information across groups when possible, and an algorithmic fairness method which specifically incorporates structural factors to better account for and address sources of bias and disparities.

Rumi Chunara

Invited talk: Ying Wei

Dec. 13, 2021, 3:10 a.m.

Invited talk: Carlo Ciliberto

Dec. 13, 2021, 5 a.m.

Carlo Ciliberto

Invited talk: Invited talk #1

Dec. 13, 2021, 5:10 a.m.

Invited talk: Mihaela Van Der Schaar

Dec. 13, 2021, 5:30 a.m.

Mihaela van der Schaar

Invited talk: Invited talk #2

Dec. 13, 2021, 5:30 a.m.

Dec. 13, 2021, 6 a.m.

In this presentation I will discuss recent insights into both the time course of pragmatic processing and the key neural infrastructure for inferring speaker meaning from coded meaning. I will show why mirror neurons are not able to handle pragmatic information. In addition, I will present evidence for the role of the Theory of Mind (ToM) network in processing of pragmatic information.

Peter Hagoort

Peter Hagoort is director of the Max Planck Institute for Psycholinguistics (since November 2006), and the founding director of the Donders Institute, Centre for Cognitive Neuroimaging (DCCN, 1999), a cognitive neuroscience research centre at the Radboud University Nijmegen. In addition, he is professor in cognitive neuroscience at the Radboud University Nijmegen. His own research interests relate to the domain of the human language faculty and how it is instantiated in the brain. In his research he applies neuroimaging techniques such as ERP, MEG, PET and fMRI to investigate the language system and its impairments as in aphasia, dyslexia and autism.

For his scientific contributions, the Royal Netherlands Academy of Arts Sciences (KNAW) awarded him with the Hendrik Mullerprijs in 2003. In 2004 he was awarded by the Dutch Queen with the "Knighthood of the Dutch Lion". In 2005 he received the NWO-Spinoza Prize (M€ 1.5). In 2007 the University of Glasgow awarded him with an honorary doctorate in science for his contributions to the cognitive neuroscience of language. In 2008 he was awarded with the Heymans Prize. In 2012 the KNAW awarded his career contribution to the cognitive neuroscience with the Academy Professorship Prize (M€ 1.0).

Peter Hagoort is member of the Royal Netherlands Academy of Arts and Sciences (KNAW), of The Koninklijke Hollandsche Maatschappij der Wetenschappen, and of the Academia Europaea. In 2018 Peter Hagoort was elected as international member of the National Academy of Sciences and as Fellow of the Cognitive Science Society.

Dec. 13, 2021, 6 a.m.

Title: Machine Learning through Database Glasses


As we witness the data science revolution, each research community legitimately reflects on its relevance and place in this new landscape. The database research community has at least three reasons to feel empowered by this revolution. This has to do with the pervasiveness of relational data in data science, the widespread need for efficient data processing, and the new processing challenges posed by data science workloads beyond the classical database workloads. The first two aforementioned reasons are widely acknowledged as core to the community's raison d'être. The third reason explains the longevity of relational database management systems success: Whenever a new promising data-centric technology surfaces, research is under way to show that it can be captured naturally by variations or extensions of the existing relational techniques.

In this talk, I will make the case for a first-principles approach to machine learning over relational databases that guided our recent work and can dramatically improve the runtime performance of machine learning. This approach exploits the algebraic and combinatorial structure of relational data processing. It also relies on compilation for hybrid database and learning workloads and on computation sharing across aggregates in learning-specific batches.

This work is the outcome of extensive collaboration of the author with colleagues from RelationalAI (, in particular Mahmoud Abo Khamis, Molham Aref, Hung Ngo, and XuanLong Nguyen, and from the FDB research project (, in particular Ahmet Kara, Milos Nikolic, Maximilian Schleich, Amir Shaikhha, and Haozhe Zhang.

Dan Olteanu

Dec. 13, 2021, 6 a.m.

When we evaluate our sensory evidence to make decisions, we also evaluate its quality so that we can judge how like we are to make correct inferences about it — that is, we judge our perceptual confidence. This is something that we want our artificial systems to be able to do as well, of course. One might think that an optimal inference strategy would be the obvious choice for the nervous system to evaluate its own sensory noise. But is this what the brain is doing? And when we say ‘optimal’, are we making a correct guess at what the cost function ought to be? In this talk I’ll present some evidence to suggest both how we can go about answering these difficult questions, and that the answer might be that the brain is evaluating its own sensory noise in ways that might seem surprising. I’ll close with some implications that these findings may have for our design of intelligent artificial agents.

Megan Peters

Dec. 13, 2021, 6:10 a.m.

Protein-based drugs are becoming some of the most important drugs of the XXI century. The typical mechanism of action of these drugs is a strong protein-protein interaction (PPI) between surfaces with complementary geometry and chemistry. Over the past three decades, large amounts of structural data on PPIs has been collected, creating opportunities for differentiable learning on the surface geometry and chemical properties of natural PPIs. Since the surface of these proteins has a non-Euclidean structure, it is a natural fit for geometric deep learning, a novel class of machine learning techniques generalising successful neural architectures to manifolds and graphs. In the talk, I will show how geometric deep learning methods can be used to address various problems in functional protein design such as interface site prediction, pocket classification, and search for surface motifs. These methods can potentially open new possibilities in designing novel drugs for "undruggable" targets.

Michael Bronstein

Dec. 13, 2021, 6:30 a.m.

Most current artificial reinforcement learning (RL) agents are trained under the assumption of repeatable trials, and are reset at the beginning of each trial. Humans, however, are never reset. Instead, they are allowed to discover computable patterns across trials, e.g.: in every third trial, go left to obtain reward, otherwise go right. General RL (sometimes called AGI) must assume a single lifelong trial which may or may not include identifiable sub-trials. General RL must also explicitly take into account that policy changes in early life may affect properties of later sub-trials and policy changes. In particular, General RL must take into account recursively that early meta-meta-learning is setting the stage for later meta-learning which is setting the stage for later learning etc. Most popular RL mechanisms, however, ignore such lifelong credit assignment chains. Exceptions are the success story algorithm (1990s), AIXI (2000s), and the mathematically optimal Gödel Machine (2003).

Jürgen Schmidhuber

Since age 15 or so, the main goal of professor Jürgen Schmidhuber has been to build a self-improving Artificial Intelligence (AI) smarter than himself, then retire. His lab's Deep Learning Neural Networks based on ideas published in the "Annus Mirabilis" 1990-1991 have revolutionised machine learning and AI. By the mid 2010s, they were on 3 billion devices, and used billions of times per day through users of the world's most valuable public companies, e.g., for greatly improved (CTC-LSTM-based) speech recognition on all Android phones, greatly improved machine translation through Google Translate and Facebook (over 4 billion LSTM-based translations per day), Apple's Siri and Quicktype on all iPhones, the answers of Amazon's Alexa, and numerous other applications. In 2011, his team was the first to win official computer vision contests through deep neural nets, with superhuman performance. In 2012, they had the first deep NN to win a medical imaging contest (on cancer detection). All of this attracted enormous interest from industry. His research group also established the fields of mathematically rigorous universal AI and recursive self-improvement in metalearning machines that learn to learn (since 1987). In 1990, he introduced unsupervised adversarial neural networks that fight each other in a minimax game to achieve artificial curiosity (GANs are a special case). In 1991, he introduced very deep learning through unsupervised pre-training, and neural fast weight programmers formally equivalent to what's now called linear Transformers. His formal theory of creativity & curiosity & fun explains art, science, music, and humor. He also generalized algorithmic information theory and the many-worlds theory of physics, and introduced the concept of Low-Complexity Art, the information age's extreme form of minimal art. He is recipient of numerous awards, author of over 350 peer-reviewed papers, and Chief Scientist of the company NNAISENSE, which aims at building the first practical general purpose AI. He is a frequent keynote speaker, and advising various governments on AI strategies.

Invited talk: Invited talk #3

Dec. 13, 2021, 6:40 a.m.

Eric Tchetgen Tchetgen

Dec. 13, 2021, 6:45 a.m.

Abstract: One of the key bottlenecks in building machine learning systems is creating and managing the massive training datasets that today’s models learn from. In this talk, we will describe our work at Snorkel AI on labeling training data efficiently using our system, Snorkel, which allows users to programmatically label training data. Snorkel has been deployed by major technology companies like Google, Facebook and Intel, academic labs, and government agencies. Rather than hand-labeling training data, users write labeling functions which label data using heuristic strategies such as pattern matching, distant supervision, and other models. These labeling functions can have noisy, conflicting, and correlated outputs, which Snorkel models and combines into clean training labels. This allows training sets to be built in hours or days, rather than months or years.

Paroma Varma

Invited talk: Invited talk #4

Dec. 13, 2021, 7 a.m.

Xiaohong Chen

Dec. 13, 2021, 7 a.m.

Progress on language generation has experienced a huge boost with the advent of large models trained on huge amounts of text. However, this kind of language modelling will only take us that far. Most natural language use is driven by communicative goals and is often grounded both in the conversational context and in extralinguistic information. Can we take inspiration from human production strategies in situated environments to drive forward natural language generation models? I will argue that yes, we can, and present a few examples of recent and ongoing research carried out within my group that follow this research programme.

Raquel Fernández

Dec. 13, 2021, 7:20 a.m.

Dec. 13, 2021, 7:30 a.m.

Independent component analysis provides a principled framework for unsupervised representation learning, with solid theory on the identifiability of the latent code that generated the data, given only observations of mixtures thereof. Unfortunately, when the mixing is nonlinear, the model is provably nonidentifiable, since statistical independence alone does not sufficiently constrain the problem. Identifiability can be recovered in settings where additional, typically observed variables are included in the generative process. We investigate an alternative path and consider instead including assumptions reflecting the principle of independent causal mechanisms exploited in the field of causality. Specifically, our approach is motivated by thinking of each source as independently influencing the mixing process. This gives rise to a framework which we term independent mechanism analysis. We provide theoretical and empirical evidence that our approach circumvents a number of nonidentifiability issues arising in nonlinear blind source separation. Reference: (accepted at: NeurIPS 2021)

Julius von Kügelgen

Dec. 13, 2021, 7:40 a.m.

Dec. 13, 2021, 7:50 a.m.

Frank Noe

Dec. 13, 2021, 7:50 a.m.

David Blei

David Blei is a Professor of Statistics and Computer Science at Columbia University, and a member of the Columbia Data Science Institute. His research is in statistical machine learning, involving probabilistic topic models, Bayesian nonparametric methods, and approximate posterior inference algorithms for massive data. He works on a variety of applications, including text, images, music, social networks, user behavior, and scientific data. David has received several awards for his research, including a Sloan Fellowship (2010), Office of Naval Research Young Investigator Award (2011), Presidential Early Career Award for Scientists and Engineers (2011), Blavatnik Faculty Award (2013), and ACM-Infosys Foundation Award (2013). He is a fellow of the ACM.

Dec. 13, 2021, 8 a.m.

The recent boom in ML/AI applications has brought into sharp focus the pressing need for tackling the concerns of scalability, usability, and manageability across the entire lifecycle of ML/AI applications. The ML/AI world has long studied the concerns of accuracy, automation, etc. from theoretical and algorithmic vantage points. But to truly democratize ML/AI, the vantage point of building and deploying practical systems is equally critical.

In this talk, I will make the case that it is high time to bridge the gap between the ML/AI world and a world that exemplifies successful democratization of data technology: databases. I will show how new bridges rooted in the principles, techniques, and tools of the database world are helping tackle the above pressing concerns and in turn, posing new research questions to the world of ML/AI. As case studies of such bridges, I will describe two lines of work from my group: query optimization for ML systems and benchmarking data preparation in AutoML platforms. I will conclude with my thoughts on community mechanisms to foster more such bridges between research worlds and between research and practice.

Arun Kumar

Arun Kumar is an Associate Professor in the Department of Computer Science and Engineering and the Halicioglu Data Science Institute and an HDSI Faculty Fellow at the University of California, San Diego. His primary research interests are in data management and systems for machine learning/artificial intelligence-based data analytics.

Dec. 13, 2021, 8:10 a.m.

Title: Control in Dialogue: When does it work?

Abstract: We describe various attempts to control dialogue models, including content, style, specificity, response-relatedness, and question-asking, as well as for controlling gender bias and safety. Overall, we observe success in controlling attributes when the controllable skill involves surface-level features, as measured by automatic metrics and human judgments. The challenge for the future, however, is how to have this same success for harder tasks.

Bio: Jason Weston is a research scientist at Facebook, NY and a Visiting Research Professor at NYU. He earned his PhD in machine learning at Royal Holloway, University of London and at AT&T Research in Red Bank, NJ (advisors: Alex Gammerman, Volodya Vovk and Vladimir Vapnik) in 2000. From 2000 to 2001, he was a researcher at Biowulf technologies. From 2002 to 2003 he was a research scientist at the Max Planck Institute for Biological Cybernetics, Tuebingen, Germany. From 2003 to 2009 he was a research staff member at NEC Labs America, Princeton. From 2009 to 2014 he was a research scientist at Google, NY. His interests lie in statistical machine learning, with a focus on reasoning, memory, perception, interaction and communication. Jason has published over 100 papers, including best paper awards at ICML and ECML, and a Test of Time Award for his work "A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning", ICML 2008 (with Ronan Collobert). He was part of the YouTube team that won a National Academy of Television Arts & Sciences Emmy Award for Technology and Engineering for Personalized Recommendation Engines for Video Discovery. He was listed as the 16th most influential machine learning scholar at AMiner and one of the top 50 authors in Computer Science in Science.

Dec. 13, 2021, 8:15 a.m.

In a new environment, people identify, remember, and recognize where they can comfortably travel. This paper argues that a robot navigator too should learn and rely upon a mental model of unobstructed space. Extensive simulation of a controller for an industrial-strength robot demonstrates how metacognition applied to a model of unobstructed space resolves some engineering challenges and provides resilience in the face of others. The robot plans and learns quickly, considers alternative actions, takes novel shortcuts, and interrupts its own plans.

Susan L Epstein

Professor Epstein is Professor of Computer Science at Hunter College and The Graduate Center of The City University of New York. She studies how brains and minds solve problems, and how a computer can capitalize on that knowledge. Interdisciplinarity is key in her work in knowledge representation and machine learning. She is an Executive Councilor for the Association for the Advancement of Artificial Intelligence, a co-Pi at the National Science Foundation's Center for Brains, Minds, and Machines, and has served as chair of The Cognitive Science Society.

Invited Talk: Ye Pu

Dec. 13, 2021, 8:15 a.m.

Ye Pu

Invited talk: Nan Rosemary Ke

Dec. 13, 2021, 8:20 a.m.

Nan Rosemary Ke

Dec. 13, 2021, 8:30 a.m.

Ricardo Silva

Dec. 13, 2021, 8:35 a.m.

Title: Disentangling faithfulness and extractiveness in abstractive summarization

Abstract: Despite recent progress in abstractive summarization, systems still suffer from faithfulness errors. While prior work has proposed various methods that improve faithfulness, it is unclear whether the improvement comes from an increased level of extractiveness of the model outputs (i.e. copying more words from the document) or truly better understanding of the document. In this talk, I will discuss the faithfulness-abstractive trade-off in summarization and a better method for evaluating faithfulness that accounts for the change in extractiveness. We then show that it is possible to mitigate the faithfulness-abstractiveness trade-off by controling the level of extractiveness during generation.

Bio: He He is an assistant professor in the Center for Data Science and Courant Institute at New York University. Her research interests include robust language understanding, text generation and interactive NLP systems. She obtained her Ph.D. from University of Maryland, College Park and worked as a post-doc at Stanford University before joining NYU.

Dec. 13, 2021, 8:42 a.m.

As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in an interpretable manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. In this talk, I will present some of our recent research that sheds light on the vulnerabilities of popular post hoc explanation techniques such as LIME and SHAP, and also introduce novel methods to address some of these vulnerabilities. More specifically, I will first demonstrate that these methods are brittle, unstable, and are vulnerable to a variety of adversarial attacks. Then, I will discuss two solutions to address some of the aforementioned vulnerabilities–(i) a Bayesian framework that captures the uncertainty associated with post hoc explanations and in turn allows us to generate explanations with user specified levels of confidence, and (ii) a framework based on adversarial training that is designed to make post hoc explanationsmore stable and robust to shifts in the underlying data; I will conclude the talk by discussing our recent theoretical results which shed light on the equivalence and robustness of state-of-the-art explanation methods.

Himabindu Lakkaraju

Hima Lakkaraju is an Assistant Professor at Harvard University focusing on explainability, fairness, and robustness of machine learning models. She has also been working with various domain experts in criminal justice and healthcare to understand the real world implications of explainable and fair ML. Hima has recently been named one of the 35 innovators under 35 by MIT Tech Review, and has received best paper awards at SIAM International Conference on Data Mining (SDM) and INFORMS. She has given invited workshop talks at ICML, NeurIPS, AAAI, and CVPR, and her research has also been covered by various popular media outlets including the New York Times, MIT Tech Review, TIME, and Forbes. For more information, please visit:

Dec. 13, 2021, 8:45 a.m.

Relational learning takes advantage of relational structure in its inputs, e.g., graphs, and its output, e.g., constraints. Building upon that, statistical relational learning (SRL) defines structure using first-order predicate logic and models probabilistic dependencies between outputs. The use of predicate logic provides a natural groundwork for SRL to take advantage of the relational theory used in modern databases. Despite this common basis, SRL frameworks still have many unexplored opportunities to use the methods developed by the database community.

Grounding, the process of enumerating all valid instantiations of structured tuples in the model, is one of the most computationally expensive components in SRL systems. In this talk, I explore the use of several concepts from database research to accelerate grounding. To improve grounding, we borrow from three well known problems in the database community: query rewriting, query containment, and multi-query optimization. Although not exact matches, each of these problems appear in SRL grounding in a form analogous to its database counterpart. By recognizing the connection to well-researched database techniques, we are able to address these problems in a way that takes advantage of the structure provided by SRL and the existing research provided by the database community. We show by implementing these techniques within an existing SRL system, we can achieve up to a 60% speedup in grounding.

Eriq Augustine

Dec. 13, 2021, 8:45 a.m.

How can deep learning be extended to encompass the kind of high-level cognition and reasoning that humans enjoy and that seems to provide us with stronger out-of-distribution generalization than current state-of-the-art AI? Looking into neuroscience and cognitive science and translating these observations and theories into machine learning, we propose an initial set of inductive biases for representations, computations and probabilistic dependency structure. These strongly tie the notion of representation with that of actions, interventions and causality, possibly giving a key to stronger identifiability of latent causal structure and ensuing better sample complexity in and out of distribution, as well as meta-cognition abilities facilitating exploration that seeks to reduce epistemic uncertainty of the underlying causal understanding of the environment.

Yoshua Bengio

Yoshua Bengio is Full Professor in the computer science and operations research department at U. Montreal, scientific director and founder of Mila and of IVADO, Turing Award 2018 recipient, Canada Research Chair in Statistical Learning Algorithms, as well as a Canada AI CIFAR Chair. He pioneered deep learning and has been getting the most citations per day in 2018 among all computer scientists, worldwide. He is an officer of the Order of Canada, member of the Royal Society of Canada, was awarded the Killam Prize, the Marie-Victorin Prize and the Radio-Canada Scientist of the year in 2017, and he is a member of the NeurIPS advisory board and co-founder of the ICLR conference, as well as program director of the CIFAR program on Learning in Machines and Brains. His goal is to contribute to uncover the principles giving rise to intelligence through learning, as well as favour the development of AI for the benefit of all.

Invited Talk: Aleksandra Faust

Dec. 13, 2021, 8:45 a.m.

Aleksandra Faust

Aleksandra Faust is a Senior Research Engineer at Google Brain, specializing in robot intelligence. Previously, Aleksandra led machine learning efforts for self-driving car planning and controls in Waymo and Google X, and was a researcher in Sandia National Laboratories, where she worked on satellites and other remote sensing applications. She earned a Ph.D. in Computer Science at the University of New Mexico (with distinction), a Master’s in Computer Science from University of Illinois at Urbana-Champaign, and a Bachelor’s in Mathematics from University of Belgrade, Serbia. Her research interests include reinforcement learning, adaptive motion planning, and machine learning for decision-making. Aleksandra won Tom L. Popejoy Award for the best doctoral dissertation at the University of New Mexico in Engineering, Mathematics, and Sciences in the period of 2011-2014. She was also awarded with the Best Paper in Service Robotics at ICRA 2018, Sandia National Laboratories’ Doctoral Studies Program and New Mexico Space Grant fellowships, as well as the Outstanding Graduate Student in Computer Science award. Her work has been featured in the New York Times.​

Dec. 13, 2021, 8:50 a.m.

There is a deep connection between causal discovery and generative models, such as factor analysis, independent component analysis, and various unsupervised deep learning models. Two key concepts that emerge are identifiability and nonstationarity. In this talk, I will review this research, providing some historical perspectives as well as open questions for future research.

Aapo Hyvarinen

Dec. 13, 2021, 9 a.m.

Talk: Disentanglement for Controllable Image Generation

Abstract: When it comes to generating diverse and plausible complex visual scenes from interpretable interfaces using deep learning, unsupervised disentangled representation learning can be very helpful. These methods can automatically discover the semantically meaningful attributes of a dataset, and represent them in a human-interpretable low-dimensional representation which can be manipulated to generate a large range of new plausible visual scenes. Disentangled representations are also conducive to semantic analogy making and sample efficient language grounding, which allows diverse language-controlled image manipulation and rendering. In this talk we will cover the strengths and limitations of the current methods for disentangled representation learning, and touch on the frontiers of this line of research where radically new approaches are starting to emerge based on the causal, physics-inspired, geometric and contrastive frameworks.

Bio: Irina is a Staff Research Scientist at DeepMind, where she works in the Froniers team. Her work aims to bring together insights from the fields of neuroscience and physics to advance general artificial intelligence through improved representation learning. Before joining DeepMind, Irina was a British Psychological Society Undergraduate Award winner for her achievements as an undergraduate student in Experimental Psychology at Westminster University, followed by a DPhil at the Oxford Center for Computational Neuroscience and Artificial Intelligence, where she focused on understanding the computational principles underlying speech processing in the auditory brain. During her DPhil, Irina also worked on developing poker AI, applying machine learning in the finance sector, and working on speech recognition at Google Research.

Irina Higgins

Dec. 13, 2021, 9:08 a.m.

I will talk about two ways of describing weighted or probabilistic relations:

First, mathematical notation for tensors with named axes, which removes the burden of keeping track of the order of axes and the purpose of each. It also makes it easy to extend operations on low-order tensors to higher order ones (e.g., to extend an operation on images to minibatches of images, or extend the attention mechanism to multiple attention heads). Our notation builds on ideas from many previous papers and software libraries, and we hope their adoption may result in clearer papers and less bug-prone implementations.

Second, hyperedge replacement graph grammars for factor graphs, or factor graph grammars (FGGs) for short, generate sets of factor graphs and can describe a more general class of models than plate notation, dynamic graphical models, case-factor diagrams, and sum-product networks can. Moreover, inference can be done on FGGs without enumerating all the generated factor graphs. For finite variable domains (but possibly infinite sets of graphs), a generalization of variable elimination to FGGs allows exact and tractable inference in many situations.

David Chiang

Dec. 13, 2021, 9:10 a.m.

As humans, we spend much of our time going beyond the here and now. We dwell on the past, long for the future, and ponder how things could have turned out differently. In this talk, I will argue that people's knowledge of the world is organized around causally structured mental models, and that much of human thought can be understood as cognitive operations over these mental models. Specifically, I will highlight the pervasiveness of counterfactual thinking in human cognition. Counterfactuals are critical for how people make causal judgments, how they explain what happened, and how they hold others responsible for their actions.

Tobias Gerstenberg

Dec. 13, 2021, 9:10 a.m.

Ernest Mwebaze

PhD in Machine learning from Groningen University in the Netherlands. 10 years in academia in Makerere University in Uganda. Co-founded the Makerere AI Lab. Worked with UN Pulse Lab Kampala and with Google AI in Accra, Ghana. Working with a not for profit Sunbird AI where I am a founding director.

Invited Talk: Shie Mannor

Dec. 13, 2021, 9:15 a.m.

Dec. 13, 2021, 9:30 a.m.

Title: Neuro-Logic and Differentiable Controls

Abstract: The key challenge to neural language generation is that language models are essentially a mouth without a brain. In this talk, I’ll discuss how we can make better lemonades out of off-the-shelf neural language models via smarter decoding-time algorithms: discrete logic integration and differentiable reasoning.

Bio: Yejin Choi is Brett Helsel Professor at the Paul G. Allen School of Computer Science & Engineering at the University of Washington and also a senior research manager at AI2 overseeing the project Mosaic. Her research focuses on commonsense knowledge and reasoning, language grounding with vision and perception, and AI for social good. She is a co-recepient of the ACL Test of Time award in 2021, the CVPR Longuet-Higgins Prize (test of time award) in 2021, the AAAI Outstanding Paper Award (best paper award) in 2020, the Borg Early Career Award (BECA) in 2018, the inaugural Alexa Prize Challenge in 2017, IEEE AI's 10 to Watch in 2016, and the Marr Prize (best paper award) at ICCV 2013. She received her Ph.D. in Computer Science at Cornell University and BS in Computer Science and Engineering at Seoul National University in Korea.

Dec. 13, 2021, 9:30 a.m.

Recently, there has been a lot of excitement around ML-enhanced (or learned) algorithms and data structures. For example, there has been work on applying machine learning to improve query optimization, indexing, storage layouts, scheduling, log-structured merge trees, sorting, compression, sketches, among many other data management tasks. Arguably, the ideas behind these techniques are similar: machine learning is used to model the data and/or workload in order to derive a more efficient algorithm or data structure. Ultimately, what these techniques will allow us to build are “instance-optimized” systems; systems that self-adjust to a given workload and data distribution to provide unprecedented performance and avoid the need for tuning by an administrator.

In this talk, I will first provide an overview of the opportunities and limitations of current ML-enhanced algorithms and data structures, present initial results of SageDB, a first instance-optimized system we are building as part of DSAIL@CSAIL at MIT, and finally outline remaining challenges and future directions.

Tim Kraska

Dec. 13, 2021, 9:35 a.m.

Suchi Saria

Suchi Saria is an assistant professor of computer science, health policy and statistics at Johns Hopkins University. Her research interests are in statistical machine learning and computational healthcare. Specifically, her focus is in designing novel data-driven computing tools for optimizing decision-making. Her work is being used to drive electronic surveillance for reducing adverse events in the inpatient setting and individualize disease management in chronic diseases. She received her PhD from Stanford University with Prof. Daphne Koller.

Her work has received recognition in the form of two cover articles in Science Translational Medicine (2010, 2015), paper awards by the the Association for Uncertainty in Artificial Intelligence (2007) and the American Medical Informatics Association (2011), an Annual Scientific Award by the Society of Critical Care Medicine (2014), a Rambus Fellowship (2004-2010), an NSF Computing Innovation fellowship (2011), and competitive awards from the Gordon and Betty Moore Foundation (2013), and Google Research (2014). In 2015, she was selected by the IEEE Intelligent Systems to the AI's 10 to Watch'' list. In 2016, she was selected as a DARPA Young Faculty awardee and to Popular Science'sBrilliant 10’’.

Invited talk: Luke Metz

Dec. 13, 2021, 10 a.m.

Dec. 13, 2021, 10 a.m.

Aleksander Madry

Aleksander Madry is the NBX Associate Professor of Computer Science in the MIT EECS Department and a principal investigator in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). He received his PhD from MIT in 2011 and, prior to joining the MIT faculty, he spent some time at Microsoft Research New England and on the faculty of EPFL. Aleksander's research interests span algorithms, continuous optimization, science of deep learning and understanding machine learning from a robustness perspective. His work has been recognized with a number of awards, including an NSF CAREER Award, an Alfred P. Sloan Research Fellowship, an ACM Doctoral Dissertation Award Honorable Mention, and 2018 Presburger Award.

Dec. 13, 2021, 10:10 a.m.

Jonathan D Stock

Dr. Jonathan Stock founded and runs the USGS National Innovation Center (NIC). The Center’s goal is to identify national scientific challenges, and to pursue them with technology partners using scarce federal dollars to best serve the public. NIC partners with industry, non-governmental organizations, other Federal Agencies and academia to improve the Nation’s ability to map, monitor and forecast its resources and hazards. Stock holds degrees from University of California, Santa Cruz, University of Washington, and University of California, Berkeley. The Center is located at the U.S. Geological Survey in Moffett Field, California, USA.

Dec. 13, 2021, 10:10 a.m.


Anima Anandkumar

Anima Anandkumar is a Bren professor at Caltech. Her research spans both theoretical and practical aspects of large-scale machine learning. In particular, she has spearheaded research in neural operators, tensor-algebraic methods, non-convex optimization, probabilistic models and deep learning.

Anima is the recipient of several awards and honors such as the Bren named chair professorship at Caltech, Alfred. P. Sloan Fellowship, Young investigator awards from the Air Force and Army research offices, Faculty fellowships from Microsoft, Google and Adobe, and several best paper awards.

Anima received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She was a postdoctoral researcher at MIT from 2009 to 2010, a visiting researcher at Microsoft Research New England in 2012 and 2014, an assistant professor at U.C. Irvine between 2010 and 2016, an associate professor at U.C. Irvine between 2016 and 2017 and a principal scientist at Amazon Web Services between 2016 and 2018.

Invited talk: Eleni Triantafillou

Dec. 13, 2021, 10:30 a.m.

Dec. 13, 2021, 10:30 a.m.

In this talk, I'll argue that human-like language use in a variable and non-stationary social environment requires a more radical shift in our models of meaning. People not only rely on pragmatic reasoning to enrich static literal meanings, but flexibly create new literal meanings together to suit the task at hand. In other words, the central computational problem of communication is not simply transmission in context, as in classical formulations, but continual learning within and across social contexts. As a case study, I'll present a physical assembly task where pairs of human participants worked together to reconstruct block towers. We found that human participants rapidly coordinated on new, more abstract language that captured each scene’s underlying structure. Motivated by these findings, we extend recent hierarchical models of convention formation with a Bayesian program learning module. This model suggests a path toward more adaptive language models that are able to 'find the right words for the job' and collaborate with human partners in a wider variety of novel contexts.

Dec. 13, 2021, 10:45 a.m.

Computer systems have become increasingly complicated through increased system specialization and heterogeneity designed to meet an increasingly diverse set of system requirements across scale, performance, energy efficiency, reliability, and quality of results. With automated system optimization opportunities being driven by predictive models of system behavior, traditional strategies for manually developing predictive behavioral models have become increasingly more complicated and less precise with growing system complexity.

In this talk, I'll present DiffTune, a technique for learning neurosymbolic performance models of modern computer processors. Processor performance models are critical for many computer systems engineering tasks, however, due to the limits on our ability to introspect modern processors, these models must be inferred from behavioral measurements. Our system leverages deep learning to perform differentiable surrogate optimization of a CPU simulator to yield models that predict the performance of programs executed on modern Intel CPUs better than state-of-the-art, handcrafted techniques from LLVM.

Our approach demonstrates that behavioral models can be effectively learned from data as well as can be constructed to provide an interpretation of their predictions through behavioral traces grounded in the execution of a simulator.

Michael Carbin

Dec. 13, 2021, 10:45 a.m.

Thomas Icard

Dec. 13, 2021, 10:45 a.m.

Patrick Heimbach

Patrick Heimbach is a computational oceanographer, professor in the Jackson School of Geosciences, and W. A. “Tex” Moncrief, Jr., chair III in Simulation-Based Engineering and Sciences in the Oden Institute at the University of Texas at Austin. His research focuses on ocean and ice dynamics and their role in the global climate system. He is an expert on the use of inverse methods and automatic differentiation applied to ocean and sea ice model parameter and state estimation, uncertainty quantification and observing system design. Patrick earned his Ph.D. in 1998 from the Max-Planck-Institute for Meteorology and the University of Hamburg, Germany. Prior to joining UT, he spent 16 years at MIT. Among his professional activities, Patrick serves on the National Academy of Sciences’ Ocean Studies Board, NSF’s Advisory Committee for Cyberinfrastructure, the CLIVAR/CliC Northern Ocean Regional Panel, and the US CLIVAR Ocean Uncertainty Quantification working group.

Dec. 13, 2021, 11 a.m.

Elizabeth Tipton

Invited Talk: Caroline Uhler: TBA

Dec. 13, 2021, 11:05 a.m.

Invited Talk: Learnable Physics Models

Dec. 13, 2021, 11:15 a.m.

Karen Liu

Dec. 13, 2021, 11:25 a.m.

Nan Rosemary Ke

Dec. 13, 2021, 11:25 a.m.

Dec. 13, 2021, 11:30 a.m.

Everyday conversation comes with an important affordance: interaction. Amongst other forms of metacommunication, interaction allows for the use of other-initiated repair: where a receiver signals trouble in understanding a producer’s utterance, thereby prompting the producer to repeat or clarify. This phenomenon is ubiquitous in everyday conversation, but its affordance has largely been ignored in computational models of language use and language evolution. In this talk, I explore what happens when we add other-initiated repair to (i) a model of disambiguation in language use, and (ii) a model of the cultural evolution of compositional structure in language. In the first case study, we show that interactive repair may help outsource some of the computational resource demands of pragmatic reasoning to interaction (where disambiguation takes place across multiple turns). In the second case study, we show that interactive repair may play a role in ‘locking in’ compositional structure over generations in the cultural evolution of language.

Marieke Woensdregt

Dec. 13, 2021, noon

Previous work has sought to understand decision confidence as a prediction of the probability that a decision will be correct, leading to debate over whether these predictions are optimal, and whether they rely on the same decision variable as decisions themselves. This work has generally relied on idealized, low-dimensional modeling frameworks, such as signal detection theory or Bayesian inference, leaving open the question of how decision confidence operates in the domain of high-dimensional, naturalistic stimuli. To address this, we developed a deep neural network model optimized to assess decision confidence directly given high-dimensional inputs such as images. The model naturally accounts for a number of puzzling dissociations between decisions and confidence, suggests a novel explanation of these dissociations in terms of optimization for the statistics of sensory inputs, and makes the surprising prediction that, despite these dissociations, decisions and confidence depend on a common decision variable.

Hakwan Lau

Taylor Webb

Invited Talk: Rohin Shah

Dec. 13, 2021, noon

Rohin Shah

Dec. 13, 2021, 12:15 p.m.

Invited Talk: Causal World Models

Dec. 13, 2021, 12:30 p.m.

Bernhard Schölkopf

Bernhard Scholkopf received degrees in mathematics (London) and physics (Tubingen), and a doctorate in computer science from the Technical University Berlin. He has researched at AT&T Bell Labs, at GMD FIRST, Berlin, at the Australian National University, Canberra, and at Microsoft Research Cambridge (UK). In 2001, he was appointed scientific member of the Max Planck Society and director at the MPI for Biological Cybernetics; in 2010 he founded the Max Planck Institute for Intelligent Systems. For further information, see

Dec. 13, 2021, 12:30 p.m.

Sergey Ovchinnikov

Dec. 13, 2021, 12:30 p.m.

We will take a whirlwind, mile high, tour of the literature on the moment-to-moment processing of two simple quantity implicatures: scalar implicatures (avoidance of underinformative statements) and the inference that adjectives will be used contrastively (avoidance on overinformativity). On the basis of the scalars, I will propose that there are two routes by which implicatures are calculated: a slow bottom-up route and top-down route that leads the appearance of instantaneous implicature. This top-down route relies on the speaker’s conceptualization of the context in linguistically relevant terms. This analysis makes some novel predictions about the role of speaker modelling in the adjective inference. I’ll present unpublished data that support these new predictions.

Jesse Snedeker

Invited Talk: Angelique Taylor

Dec. 13, 2021, 12:30 p.m.

Angelique Taylor

I am a Visiting Research Scientist at Facebook Reality Labs Research. I will start my appointment as Assistant Professor at Cornell University in the Summer of 2022!

My research lies in the intersection of computer vision, robotics, healthcare, and artificial intelligence. My work aims to design intelligent systems that enable robots to interact and work with groups of people in safety-critical environments. I am also a National Science Foundation GRFP Fellow, Arthur J. Schmitt Presidential Fellow, GEM Fellow, Google Anita Borg Memorial Scholar, National Center for Women in Information Technology (NCWIT), Microsoft Dissertation Grant, and Grace Hopper Celebration of Women in Computing (GHC) Scholar.

I received my Ph.D. in Computer Science and Engineering from the University of California San Diego in 2021. Also, I received my B.S. in Electrical Engineering and Computer Engineering from the University of Missouri-Columbia in 2015 and my A.S. in Engineering Science from Saint Louis Community College in 2012.

Invited Talk: Ugo Rosolia

Dec. 13, 2021, 1 p.m.

Dec. 13, 2021, 1 p.m.

Harshitha Menon Menon

Dec. 13, 2021, 1:30 p.m.

Task-oriented dialog is inherently about contextual action: users address the system from a specific context and the system must decide what to do in response. This talk will present some of the core principles of the Semantic Machines team's approach to conversational AI: program synthesis for action prediction, compositionality for handling complex tasks, metacomputation for reference and revision, error handling for dialog management, and dynamic generation for truthful output. I will also mention ways in which real-world constraints can help to inform the design of conversational systems.

Dec. 13, 2021, 1:50 p.m.

Title: Off the Beaten Path: Domain-Agnostic ML for Controllable Generation and Beyond

Abstract: In many fields of machine learning, the diversity of data domains studied by researchers is significantly more narrow than the diversity of domains in the real world. This has two disadvantages: 1) Existing methods are domain-specific, and fail to serve many impactful domains, including medical and scientific applications, and 2) Failure to examine a broader diversity of data makes it challenging to uncover broader principles underpinning the success of methods across domains. In this talk, I will discuss some of our work on developing machine learning techniques that operate on a wider diversity of data, including a new modeling framework (viewmaker networks) and benchmark (DABS) for self-supervised learning. I will then turn to controllable generation, discussing our work on controllable generation of molecular edits (C5T5), which leverages techniques from both the NLP and drug design communities. I will conclude by discussing future directions and opportunities for domain-agnostic ML in controllable generation and beyond.

Bio: Alex is a fourth-year PhD student in Computer Science at Stanford, advised by Noah Goodman and part of the Stanford NLP Group. His research focuses on better understanding, building, and controlling pretrained models, especially in domain-agnostic and multimodal settings. He is supported by an Open Philanthropy AI Fellowship, and has also spent time at Google Brain and Google Language.

Dec. 13, 2021, 2 p.m.

Very young children routinely solve causal problems that are still very challenging for machine learning systems. I will outline several exciting recent lines of work looking at young children’s causal reasoning and learning and comparing it to learning in various computational models. This includes work on the selection of relevant test variables, learning abstract and analogical relationships, and, most importantly, techniques for active learning and causal exploration.

Alison Gopnik

Alison Gopnik is a professor of psychology and affiliate professor of philosophy at the University of California at Berkeley. She received her BA from McGill University and her PhD. from Oxford University. She is an internationally recognized leader in the study of children’s learning and development and was one of the founders of the field of “theory of mind”, an originator of the “theory theory” of children’s development and more recently introduced the idea that probabilistic models and Bayesian inference could be applied to children’s learning. She has held a Center for Advanced Studies in the Behavioral Sciences Fellowship, the Moore Distinguished Scholar fellowship at the California Institute of Technology, the All Souls College Distinguished Visiting Fellowship at Oxford, and King’s College Distinguished Visiting Fellowship at Cambridge. She is an elected member of the Society of Experimental Psychologists and the American Academy of Arts and Sciences and a fellow of the Cognitive Science Society. She has been continuously supported by the NSF and was PI on a 2.5 million dollar interdisciplinary collaborative grant on causal learning from the McDonnell Foundation.

She is the author or coauthor of over 100 journal articles and several books including “Words, thoughts and theories” MIT Press, 1997, and the bestselling and critically acclaimed popular books “The Scientist in the Crib” William Morrow, 1999, and “The Philosophical Baby; What children’s minds tell us about love, truth and the meaning of life”, which won the Cognitive Development Society Best Book Prize in 2011, and The Gardener and the Carpenter, Farrar, Strauss and Giroux, 2009, 2016. She has also written widely about cognitive science and psychology for Science, The New York Times, Scientific American, The New Yorker, The Times Literary Supplement, The New York Review of Books, New Scientist and Slate, among others. Her TED talk on her work has been viewed more than 2.8 million times. And she has frequently appeared on TV and radio including “The Charlie Rose Show” and “The Colbert Report”. Since 2013 she has written the Mind and Matter column for the Wall Street Journal.

Dec. 13, 2021, 2:15 p.m.

Title: Generating and Editing Images Using StyleGAN and CLIP

Abstract: Recently, there has been an increased interest in leveraging the semantic power of large-scale Contrastive-Language-Image-Pre-training (CLIP) models. Specifically, combining the power of CLIP with the generative power of StyleGAN has led to novel text-driven methods with unprecedented generative performance.
In this talk, I will start by presenting StyleCLIP. I will show three approaches in which CLIP can be paired with StyleGAN, to provide endless expressive power for image editing. Then I will present our recent follow-up work, StyleGAN-NADA where CLIP facilitates shifting a trained StyleGAN to new domains without collecting even a single image from those domains.

Bio: Or Patashnik is a graduate student in the School of Computer Science at Tel Aviv University, under the supervision of Daniel Cohen-Or. Her research is about image generation tasks such as image-to-image translation, image editing, etc.

Dec. 13, 2021, 2:20 p.m.

A pervasive task found throughout the empirical sciences is to determine the effect of interventions from observational data. It is well-understood that assumptions are necessary to perform such causal inferences, an idea popularized through Cartwright’s motto: "no causes-in, no causes-out." One way of articulating these assumptions is through the use of causal diagrams, which are a special type of graphical model with causal semantics [Pearl, 2000]. The graphical approach has been applied successfully in many settings, but there are still challenges to its use, particularly in complex, high-dimensional domains. In this talk, I will introduce cluster causal diagrams (C-DAGs), a novel causal graphical model that allows for the partial specification of the relationships among variables. C-DAGs provide a simple yet effective way to partially abstract a grouping (cluster) of variables among which causal relationships are not fully understood while preserving consistency with the underlying causal system and the validity of causal identification tools. Reference:

Dec. 13, 2021, 2:40 p.m.

Victor Chernozhukov

Dec. 13, 2021, 3:50 p.m.

Dec. 13, 2021, 3:50 p.m.


Dec. 13, 2021, 4:15 p.m.

Chelsea Finn

Dec. 13, 2021, 4:20 p.m.

Title: Controllable Text Generation with Multiple Constraints

Abstract: Conditional language generation models produce highly fluent but often unreliable outputs. This motivated a surge of approaches to controlling various attributes of the text that models generate. However, the majority of existing approaches are focused on monolingual settings and on controlling for coarse-grained attributes of text (typically, only one binary attribute). This talk will propose to focus on finer-grained aspects of the generated texts, including in multilingual settings. I will present an algorithm for controllable inference from pretrained models, which aims at rewriting model outputs with multiple sentence-level, fine-grained, monolingual and cross-lingual constraints. I will conclude with discussion of future work.

Bio: Yulia Tsvetkov is an assistant professor at the Paul G. Allen School of Computer Science & Engineering at University of Washington. Her research group works on NLP for social good, multilingual NLP, and language generation. The projects are motivated by a unified goal: to extend the capabilities of human language technology beyond individual populations and across language boundaries, thereby enabling NLP for diverse and disadvantaged users, the users that need it most. Prior to joining UW, Yulia was an assistant professor at Carnegie Mellon University and a postdoc at Stanford. Yulia is a recipient of the Okawa research award, Amazon machine learning research award, Google faculty research award, and multiple NSF awards.

Yulia Tsvetkov

Dec. 13, 2021, 4:30 p.m.

Search-based techniques have been demonstrated effective in solving complex optimization problems that arise in domain-specific compilers for machine learning (ML). Unfortunately, deploying such techniques in production compilers is impeded by several limitations. In this talk, I will present an autotuner for production ML compilers that can tune both graph-level and subgraph-level optimizations at multiple compilation stages. The autotuner applies a flexible search methodology that defines a search formulation for joint optimizations by accurately modeling the interactions between different compiler passes. The autotuner tunes tensor layouts, operator fusion decisions, tile sizes, and code generation parameters in XLA, a production ML compiler, using various search strategies. We demonstrate how to incorporate machine learning techniques such as a learned cost model and various learning-based search strategies to reduce autotuning time. Our learned cost model has high accuracy and outperforms a heavily-optimized analytical performance model. In an evaluation across 150 ML training and inference models on Tensor Processing Units (TPUs), the autotuner offers up to 2.4x and an average 5% runtime speedup over the heavily-optimized XLA compiler. The autotuner has been deployed to automatically tune the most heavily-used production models in Google’s fleet everyday.

Mangpo Phothilimthana

Dec. 13, 2021, 5:10 p.m.

Leveraging machine learning for system optimization can relieve researchers of designing manual heuristics, a time-consuming procedure. In this talk, we mainly discuss data-driven iterative refinement that models optimization as a sequential decision process: an initial solution to the optimization problem is iteratively improved until convergence. Each refinement step is controlled by a ML model learned from previous optimization trials, or data collected so far in this trial. We then introduce two examples in ML system, Coda and N-Bref, that de-compile assembly codes back to its source code. In both cases, first a coarse source program is proposed, and then refined by learned models to match the assembly. These approaches show strong performance compared to existing de-compilation tools that rely upon human heuristics and domain knowledge.

Yuandong Tian

Dec. 14, 2021, 3:10 a.m.

Emtiyaz Khan, Dharmesh Tailor, Siddharth Swaroop

Dec. 14, 2021, 3:30 a.m.

Yee Whye Teh

Dec. 14, 2021, 4 a.m.

Transformers - the purely attention based NN architecture - have emerged as a powerful tool in sequence processing. But how does a transformer think? When we discuss the computational power of RNNs, or consider a problem that they have solved, it is easy for us to think in terms of automata and their variants (such as counter machines and pushdown automata). But when it comes to transformers, no such intuitive model is available.

In this talk I will present a programming language, RASP (Restricted Access Sequence Processing), which we hope will serve the same purpose for transformers as finite state machines do for RNNs. In particular, we will identify the base computations of a transformer and abstract them into a small number of primitives, which are composed into a small programming language. We will go through some example programs in the language, and discuss how a given RASP program relates to the transformer architecture.

Gail Weiss

Dec. 14, 2021, 4 a.m.

Weinan E

Dec. 14, 2021, 5 a.m.

I will explore the boundaries between differentiable programming and logic, through the prism of the Curry-Howard correspondence. I will recall the latter and explain how automatic differentiation fits into each of its three facets: functions, proofs and programs. In particular, I will explain how backpropagation is identified with Gödel's Dialectica translation, a transformation of logical formulas historically used to prove consistency theorems and widely used in proof theory since then.


Invited Talk: Artificial what?

Dec. 14, 2021, 5:10 a.m.

Shane Legg

Dec. 14, 2021, 5:14 a.m.

Machine learning has demonstrated being highly successful at solving many real-world applications ranging from information retrieval, data mining, and speech recognition, to computer graphics, visualization, and human-computer interaction. However, most users often treat the machine learning model as a “black box” because of its incomprehensible functions and unclear working mechanism. Without a clear understanding of how and why the model works, the development of high-performance models typically relies on a time-consuming trial-and-error procedure. This talk presents the major challenges explainable machine learning and exemplifies the solutions with several visual analytics techniques and examples, including data quality diagnosis, model understanding and diagnosis.

Shixia Liu is a professor at Tsinghua University. Her research interests include explainable machine learning, visual text analytics, and text mining. Shixia was elevated to an IEEE Fellow in 2021 and induced into IEEE Visualization Academy in 2020. She is an associate editor-in-chief of IEEE Transactions on Visualization and Computer Graphics and is an associate editor of Artificial Intelligence, IEEE Transactions on Big Data, and ACM Transactions on Intelligent Systems and Technology. She was one of the Papers Co-Chairs of IEEE VIS (VAST) 2016 and 2017 and is in the steering committee of IEEE VIS (2020-2023).

Shixia Liu

Shixia Liu is a professor at Tsinghua University. Her research interests include explainable machine learning, visual text analytics, and text mining. Before joining Tsinghua University, she worked as a lead researcher at Microsoft Research Asia and a research staff member and research manager at IBM China Research Lab. Shixia was elevated to an IEEE Fellow in 2021 and induced into IEEE Visualization Academy in 2020. She is an associate editor-in-chief of IEEE Transactions on Visualization and Computer Graphics and is an associate editor of IEEE Transactions on Big Data, Artificial Intelligence. She was one of the Papers Co-Chairs of IEEE VIS (VAST) 2016 and 2017 and is in the steering committee of IEEE VIS (2020-2023) and IEEE VDS (2020-2021).

Dec. 14, 2021, 5:15 a.m.

Neha Yadav

Dec. 14, 2021, 5:30 a.m.

Atılım Güneş Baydin, Francesco Pinto

Dec. 14, 2021, 5:35 a.m.

We present the probabilistic numeric solver BayesCG, for solving linear systems with real symmetric positive definite coefficient matrices. BayesCG is an uncertainty aware extension of the conjugate gradient (CG) method that performs solution-based inference with Gaussian distributions to capture the uncertainty in the solution due to early termination. Under a structure exploiting `Krylov' prior, BayesCG produces the same iterates as CG. The Krylov posterior covariances have low rank, and are maintained in factored form to preserve symmetry and positive semi-definiteness. This allows efficient generation of accurate samples to probe uncertainty in subsequent computation.

Speaker bio: Ilse C.F. Ipsen received a BS from the University of Kaiserslautern in Germany and a Ph.D. from Penn State, both in Computer Science. She is a Professor of Mathematics at NCState, with affiliate appointments in Statistics and the Institute for Advanced Analytics. Her research interests include numerical linear algebra, randomized algorithms, and probabilistic numerics. She is a Fellow of the AAAS and SIAM.

Dec. 14, 2021, 5:40 a.m.

Sven Wellmann

Dec. 14, 2021, 5:40 a.m.

Joelle Pineau

Joelle Pineau is an Associate Professor and William Dawson Scholar at McGill University where she co-directs the Reasoning and Learning Lab. She also leads the Facebook AI Research lab in Montreal, Canada. She holds a BASc in Engineering from the University of Waterloo, and an MSc and PhD in Robotics from Carnegie Mellon University. Dr. Pineau's research focuses on developing new models and algorithms for planning and learning in complex partially-observable domains. She also works on applying these algorithms to complex problems in robotics, health care, games and conversational agents. She serves on the editorial board of the Journal of Artificial Intelligence Research and the Journal of Machine Learning Research and is currently President of the International Machine Learning Society. She is a recipient of NSERC's E.W.R. Steacie Memorial Fellowship (2018), a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), a Senior Fellow of the Canadian Institute for Advanced Research (CIFAR) and in 2016 was named a member of the College of New Scholars, Artists and Scientists by the Royal Society of Canada.

Dec. 14, 2021, 5:50 a.m.

Danilo Rezende, Peter Wirnsberger

Dec. 14, 2021, 6 a.m.

In the modern world, we cooperate with and live side by side with strangers, who often look, act, and speak in ways very different to us. We work together on goals with culturally distant nations that span the globe. I'm recording this talk, but I could have given it to you in person. That's unusual in many respects. It's unusual from a cross-species perspective - comparing us to our closest primate cousins, a room full of strange chimps is a room full of dead chimps. It's unusual from a historical perspective - even a few hundred years ago, a stranger in our midst was a potential threat. And it's unusual from a geographic perspective - even today some places are safer and more cooperative than others. Cooperation varies in scale, intensity, and domain - some countries cooperate on healthcare, others on defence. Compounding the puzzle, the evolutionary mechanisms that explain cooperation undermine one another and can stabilize non-cooperative or even maladaptive behavior. I'll discuss the latest discoveries in the science of cultural evolution and human cooperation and how these might apply to the development of cooperative AI.

Michael Muthukrishna

Michael Muthukrishna is Associate Professor of Economic Psychology and STICERD Developmental Economics Group Affiliate at the London School of Economics, CIFAR Azrieli Global Scholar at the Canadian Institute for Advanced Research, and Technical Director of The Database of Religious History ( His research focuses on human biological and cultural evolution, how this understanding of human behavior and social change can improve innovation, reduce corruption, and increase cross-cultural cooperation. His work is featured in international and national news outlets including CNN, BBC, Wall Street Journal, The Economist, Scientific American, Nature News, and Science News, and in the UK in the Times, Telegraph, Mirror, Sun, and Guardian. Michael's research is informed by his educational background in engineering and psychology, with graduate training in evolutionary biology, economics, and statistics, and his personal background living in Sri Lanka, Botswana, Papua New Guinea, Australia, Canada, United States, and United Kingdom. He is currently working on a book to be published with MIT Press.

Dec. 14, 2021, 6 a.m.

If we want to build machines that think and learn like humans do, and that can learn and think with people, our best bet is to build machines that can learn to write programs expressing their thoughts in human-understandable code. These machines should also be able to learn from the kinds of data that humans naturally consume and produce: one or a few examples of program execution, and natural language descriptions of program goals or high-level structure. We are far from achieving this goal, but the last few years have seen intriguing first steps and opened up a new set of hard problems for future work. I will talk about some lessons learned: how we might best combine neural and symbolic approaches under the broad rubric of probabilistic inference in hierarchical generative models for code, and the synergies to be gained from looking at both execution examples and natural language as sources of data. I will also discuss promising near-term challenge domains that capture foundational human capacities for learning concepts, systems of concepts (or domain theories) and causal models, and where the next generation of program learning approaches could make important progress.

Josh Tenenbaum

Dec. 14, 2021, 6:05 a.m.

Michael Brudno

Dec. 14, 2021, 6:10 a.m.

Asja Fischer, Sina Däubener

Dec. 14, 2021, 6:30 a.m.

In this talk I will present some of our findings (in collaboration with the Bank of Canada) on using RL to approximate the policy rules of banks participating in a high-value payments system. The objective of the agents is to learn a policy function for the choice of amount of liquidity provided to the system at the beginning of the day. Individual choices have complex strategic effects precluding a closed form solution of the optimal policy, except in simple cases. We show that in a simplified two-agent setting, agents using reinforcement learning do learn the optimal policy that minimizes the cost of processing their individual payments. We also show that in more complex settings, both agents learn to reduce their liquidity costs. Our results show the applicability of RL to estimate best-response functions in real-world strategic games.

Dec. 14, 2021, 6:37 a.m.

AI is very successful at certain tasks, even exceeding human performance. Unfortunately, the most powerful methods suffer from both difficulty in explaining why a particular result was obtained and a lack of robustness. Our most powerful machine learning models are very sensitive to even small changes. Perturbations in the input data can have a dramatic impact on the output and lead to completely different results. This is of great importance in virtually all critical areas where we suffer from poor data quality, i.e., we do not have the expected i.i.d. data. Therefore, the use of AI in areas that impact human life (agriculture, climate, health, ...) has led to an increased demand for trustworthy AI. In sensitive areas where traceability, transparency and interpretability are required, explainability is now even mandatory due to regulatory requirements. One possible step to make AI more robust is to combine statistical learning with knowledge representations. For certain tasks, it may be beneficial to include a human in the loop. A human expert can - sometimes, of course, not always - bring experience, expertise and conceptual understanding to the AI pipeline. Such approaches are not only a solution from a legal perspective, but in many application areas, the "why" is often more important than a pure classification result. Consequently, both explainability and robustness can promote reliability and trust and ensure that humans remain in control, thus complementing human intelligence with artificial intelligence.

Andreas Holzinger

Andreas pioneered in interactive machine learning with the human-in-the-loop. For his achievements he was elected a member of Academia Europea in 2019, the European Academy of Science. He is member of the European Laboratory for Learning and Intelligent Systems (ELLIS) since 2021. The use of AI in domains that impact human life (agriculture, climate, health, ….) has led to increased demand for trustworthy AI. Andreas fosters robustness & explainability as enabler for trusted AI and advocates a synergistic approach to put the human-in-control of AI, aligning AI with human values, ethical principles and legal requirements, ensuring privacy, security, and safety.

Dec. 14, 2021, 7:10 a.m.

Many of today’s most promising technological systems involve very large numbers of autonomous agents that influence each other and make strategic decisions within a network structure. Examples include opinion dynamics, targeted marketing in social networks, economic exchange and international trade, product adoption and social contagion.

While traditional tools for the analysis of these systems assumed that a social planner has full knowledge of the network of interactions, when we turn to very large networks two issues emerge. First, collecting data about the exact network of interactions becomes very expensive or not at all possible because of privacy concerns. Second, methods for designing optimal interventions that rely on the exact network structure typically do not scale well with the population size.

To obviate these issues, in this talk I will present a framework in which the social planner designs interventions based on probabilistic instead of exact information about agent’s interactions. I will introduce the tool of “graphon games” as a way to formally describe strategic interactions in this setting and I will illustrate how this tool can be exploited to design interventions. I will cover two main applications: targeted budget allocation and optimal seeding in contagion processes. I will illustrate how the graphon approach leads to interventions that are asymptotically optimal in terms of the population size and can be computed without requiring exact network data.

Francesca Parise

Dec. 14, 2021, 7:10 a.m.

Martin Riedmiller

Dec. 14, 2021, 7:20 a.m.

Katja Hofmann

Dr. Katja Hofmann is a Principal Researcher at the Game Intelligence group at Microsoft Research Cambridge, UK. There, she leads a research team that focuses on reinforcement learning with applications in modern video games. She and her team strongly believe that modern video games will drive a transformation of how we interact with AI technology. One of the projects developed by her team is Project Malmo, which uses the popular game Minecraft as an experimentation platform for developing intelligent technology. Katja's long-term goal is to develop AI systems that learn to collaborate with people, to empower their users and help solve complex real-world problems. Before joining Microsoft Research, Katja completed her PhD in Computer Science as part of the ILPS group at the University of Amsterdam. She worked with Maarten de Rijke and Shimon Whiteson on interactive machine learning algorithms for search engines.

Dec. 14, 2021, 7:30 a.m.

Shuran Song

Dec. 14, 2021, 7:30 a.m.

Philipp Grohs

Dec. 14, 2021, 7:38 a.m.

Learning algorithms are increasingly being deployed in a variety of real world systems with other autonomous decision processes and human decision-makers. Importantly, in many settings humans react to the decisions algorithms make. This calls into question the following classically held tenet in supervised machine learning: when it is arduous to model a phenomenon, observations thereof are representative samples from some static or otherwise independent distribution. Without taking such reactions into consideration at the time of design, machine learning algorithms are doomed to result in unintended consequences such as reinforcing institutional bias or incentivizing gaming or collusion. In this talk, we discuss several directions of research along which we have made progress towards closing the loop in ML including robustness to model misspecification in capturing strategic behavior, decision-dependent learning in the presence of competition ('multiplayer performative prediction'), and dynamic decision-dependent learning wherein the data distribution may drift in time. Open questions will be posed towards the end of the talk.

Lillian Ratliff

Dec. 14, 2021, 7:45 a.m.

Sortition is a storied paradigm of democracy built on the idea of choosing representatives through lotteries instead of elections. In recent years this idea has found renewed popularity in the form of citizens’ assemblies, which bring together randomly selected people from all walks of life to discuss key questions and deliver policy recommendations. A principled approach to sortition, however, must resolve the tension between two competing requirements: that the demographic composition of citizens’ assemblies reflect the general population and that every person be given a fair chance (literally) to participate. I will describe our work on designing, analyzing and implementing randomized participant selection algorithms that balance these two requirements. I will also discuss practical challenges in sortition based on experience with the adoption and deployment of our open-source system, Panelot.

Ariel Procaccia

Dec. 14, 2021, 7:50 a.m.

Nick Roy

Dec. 14, 2021, 8 a.m.

Adi Hanuka, Owen Convery

Dec. 14, 2021, 8:05 a.m.

Across a multitude of domains and applications, machine learning has become widespread as a tool for informing decisions about humans, and for humans. But most tools used in practice focus exclusively on mapping inputs to relevant outputs - and take no account of how humans respond to these outputs. This begs the question: how should we design learning systems when we know they will be used in social settings? The goal of this talk is to initiate discussion regarding this question and the paths we can take towards possible answers. Building on strategic classification as an appropriate first step, I will describe some of our work, both recent and current, that aims to extend strategic classification towards more realistic strategic settings that include more elaborate forms of economic modeling. Finally, I will argue for a broader view of how we can approach learning problems that lie just outside the scope of classic supervised learning.

Nir Rosenfeld

Dec. 14, 2021, 8:15 a.m.

Today I will be talking about the role of conventions in human-AI collaboration. Conventions are norms/equilibria we build through repeated interactions with each other. The idea of conventions has been well-studied in linguistics. We will start the talk by discussing the notion of linguistic conventions, and how we can build AI agents that can effectively build these conventions. We then extend the idea of linguistic conventions to conventions through actions. We discuss a modular approach to separate partner-specific conventions and rule-dependent representations. We then discuss how this can be done effectively when working with partners whose actions are high dimensional. Finally we extend the notion of conventions to larger scale systems beyond dyadic interactions. Specifically, we discuss what conventions/equilibria emerge in mixed-autonomy traffic networks and how that can be leveraged for better dynamic routing of vehicles.

Dec. 14, 2021, 8:20 a.m.

Daniel Tanis

Dec. 14, 2021, 9 a.m.

Aaron Roth

Dec. 14, 2021, 9:01 a.m.

Song-Chun Zhu

Dec. 14, 2021, 9:15 a.m.

Dec. 14, 2021, 9:26 a.m.

Himabindu Lakkaraju

Hima Lakkaraju is an Assistant Professor at Harvard University focusing on explainability, fairness, and robustness of machine learning models. She has also been working with various domain experts in criminal justice and healthcare to understand the real world implications of explainable and fair ML. Hima has recently been named one of the 35 innovators under 35 by MIT Tech Review, and has received best paper awards at SIAM International Conference on Data Mining (SDM) and INFORMS. She has given invited workshop talks at ICML, NeurIPS, AAAI, and CVPR, and her research has also been covered by various popular media outlets including the New York Times, MIT Tech Review, TIME, and Forbes. For more information, please visit:

Dec. 14, 2021, 9:31 a.m.

Mrinmaya Sachan

Dec. 14, 2021, 10 a.m.

Jack Cable

Invited Talk: Environment Capacity

Dec. 14, 2021, 10 a.m.

Benjamin Van Roy

Dec. 14, 2021, 10:15 a.m.

Rich Caruana

Dec. 14, 2021, 10:20 a.m.

Optimization is at the heart of machine learning, and gradient computation is central to many optimization techniques. Stochastic optimization, in particular, has taken center stage as the principal method of fitting many models, from deep neural networks to variational Bayesian posterior approximations. Generally, one uses data subsampling to efficiently construct unbiased gradient estimators for stochastic optimization, but this is only one possibility. In this talk, I will discuss an alternative approach to constructing unbiased gradient estimates in machine learning problems. We will revisit the Jacobian accumulation problem at the heart of automatic differentiation, observing that it is possible to collapse the linearized computational graph of, e.g., deep neural networks, in a randomized way such that less memory is used but little performance is lost. This is joint work with students Alex Beatson, Deniz Oktay, Joshua Aduol, and Nick McGreivy.

Dec. 14, 2021, 10:30 a.m.

Warren Powell

Warren B. Powell is Professor Emeritus at Princeton University, where he taught for 39 years, and is currently the Chief Analytics Officer at Optimal Dynamics. He is the founder and director of CASTLE Labs, which spans contributions to models and algorithms in stochastic optimization, with applications to energy systems, transportation, health, e-commerce, and the laboratory sciences (see He has pioneered the use of approximate dynamic programming for high-dimensional applications, and the knowledge gradient for active learning problems. His recent work has focused on developing a unified framework for sequential decision problems under uncertainty, spanning active learning to a wide range of dynamic resource allocation problems. He has authored books on Approximate Dynamic Programming and (with Ilya Ryzhov) Optimal Learning, and is nearing completion of a book Reinforcement Learning and Stochastic Optimization: A unified framework for sequential decisions.

Dec. 14, 2021, 10:40 a.m.

Bram Stieljes

Dec. 14, 2021, 10:45 a.m.

Jeremy Epstein

Jeremy Epstein is lead program officer for the NSF Secure and Trustworthy Cyberspace (SaTC) program, NSF’s flagship multi-disciplinary cybersecurity & privacy program. Prior to this role, he was Deputy Division Director of CISE/CNS, where he was responsible for research in a range of computer science programs, including cybersecurity, cyber physical systems, smart and connected communities, computer systems, networking, computer science education, technology transition, and other assorted topics.

Prior to (re)joining NSF in 2017, he was a program manager at DARPA I2O, and a program officer for NSF's Secure and Trustworthy Cyberspace (SaTC) program. He spent most of his career in industry, including at SRI International and webMethods. His areas of interest are in cybersecurity, particularly elections and voting security.

Jeremy is also chair of the Association for Computing Machinery US Technology Policy Committee, founder/director of ACSA Scholarships for Women Studying Information Security (SWSIS), and former associate editor-in-chief of IEEE Security and Privacy Magazine.

Dec. 14, 2021, 10:45 a.m.

When theorizing the causal effects that algorithmic decisions have on a population, an important modeling choice arises. We can model the change to a population in the aggregate, or we can model the response to a decision rule at the individual level. Standard economic microfoundations, for instance, ground the response in the utility-maximizing behavior of individuals.

Providing context from sociological and economic theory, I will argue why this methodological problem is of significant importance to machine learning. I will focus on the relationships and differences between two recent lines of work, called strategic classification and performative prediction. While performative prediction takes a macro-level perspective on distribution shifts induced by algorithmic predictions, strategic classification builds on standard economic microfoundations. Based on work with Meena Jagadeesan and Celestine Mendler-Dünner, I will discuss the serious shortcomings of standard microfoundations in the context of machine learning and speculate about the alternatives that we have.

Moritz Hardt

Dec. 14, 2021, 10:51 a.m.

Ascertaining that a deep network does not rely on an unknown spurious signal as basis for its output, prior to deployment, is crucial in high stakes settings like healthcare. While many post hoc explanation methods have been shown to be useful for some end tasks, recent theoretical and empirical evidence suggests that these methods may not be faithful or useful. This leaves little guidance for a practitioner or a researcher using these methods in their decision process. In this talk, we will consider three classes of post hoc explanations--feature attribution, concept activation, and training point ranking--; and ask whether these approaches can alert a practitioner as to a model's reliance on unknown spurious training signals.

Julius Adebayo

Julius Adebayo is a Ph.D. student at MIT working on developing and understanding approaches that seek to make machine learning-based systems reliable when deployed. More broadly, he is interested in rigorous approaches to help develop models that are robust to spurious associations, distribution shifts, and align with 'human' values. Website:

Dec. 14, 2021, 11 a.m.

Almost twenty years ago, Thomas Minka nicely illustrated that Bayesian model averaging (BMA) is different from model combination. Model combination works by enriching the model space, because it considers all possible linear combinations of all the models in the model class, while BMA represents the inability for knowing which is the best single model when using a limited amount data. However, twenty years later, this distinction becomes not so clear in the context of ensembles of deep neural networks: are deep ensembles performing a crude approximation of a highly multi-modal Bayesian posterior? Or, are they exploiting an enriched model space and, in consequence, they should be interpreted in terms of model combination? In this talk, we will introduce recently published theoretical analyses that will shed some light on these questions. As you will see in this talk, whether your model is wrong or not plays a crucial role in the answers to these questions.

Speaker bio: Andres R. Masegosa is an associate professor at the Department of Computer Science at Aalborg University (Copenhagen Campus-Denmark). Previously, he was an assistant professor at the University of Almería (Spain). He got his PhD in Computer Science at the University of Granada in 2009. He is broadly interested in modelling intelligent agents that learn from experience using a probabilistic approach. He has published more than sixty papers in international journals and conferences in the field of machine learning.

Andres Masegosa

Dec. 14, 2021, 11 a.m.

This talk will give a gentle introduction to Dex, an experimental programming language. Dex is designed to combine the clarity and safety of high-level functional languages with the efficiency of low-level numerical languages. For example, Dex allows one to move much of the informal type and shape information normally contained in comments into compile-time checked types, while also omitting unambiguous details, to keep things terse. It also allows in-place updates and stateful, loopy code that can automatically take advantage of parallelism in a fine-grained way. We'll demonstrate these features on standard deep architectures like attention and graph neural nets.


David Duvenaud

David Duvenaud is an assistant professor in computer science at the University of Toronto. His research focuses on continuous-time models, latent-variable models, and deep learning. His postdoc was done at Harvard University, and his Ph.D. at the University of Cambridge. David also co-founded Invenia, an energy forecasting and trading company.

Dec. 14, 2021, 11:10 a.m.

Dragos Margineantu

Ph.D. (2001) - Machine Learning Technical Lead of AI Research at Boeing.

Dec. 14, 2021, 11:12 a.m.

Data-based decision-making must account for the manipulation of data by agents who are aware of how decisions are being made and want to affect their allocations. We study a framework in which, due to such manipulation, data becomes less informative when decisions depend more strongly on data. We formalize why and how a decisionmaker should commit to underutilizing data. Doing so attenuates information loss and thereby improves allocation accuracy.

Dec. 14, 2021, 11:20 a.m.

Douwe Kiela

Dec. 14, 2021, 11:30 a.m.

Kristin E. Lauter

Kristin Estella Lauter is a mathematician and cryptographer who works at the interface of machine learning and cryptography. From 2008–2021, she was Partner Research Manager of the Cryptography and Privacy Research Group at Microsoft Research; her group developed SEAL, a leading OSS Homomorphic Encryption library. In April 2021, Lauter joined Facebook AI Research (FAIR) as the West Coast Director of Research Science. She was President of the Association for Women in Mathematics from 2015 - 2017. She is an elected Fellow of AAAS, AMS, SIAM, and AWM and an elected honorary member of the Royal Spanish Mathematical Society.

Dec. 14, 2021, 11:30 a.m.

Yarin Gal

Yarin leads the Oxford Applied and Theoretical Machine Learning (OATML) group. He is an Associate Professor of Machine Learning at the Computer Science department, University of Oxford. He is also the Tutorial Fellow in Computer Science at Christ Church, Oxford, and a Turing Fellow at the Alan Turing Institute, the UK’s national institute for data science and artificial intelligence. Prior to his move to Oxford he was a Research Fellow in Computer Science at St Catharine’s College at the University of Cambridge. He obtained his PhD from the Cambridge machine learning group, working with Prof Zoubin Ghahramani and funded by the Google Europe Doctoral Fellowship. He made substantial contributions to early work in modern Bayesian deep learning—quantifying uncertainty in deep learning—and developed ML/AI tools that can inform their users when the tools are “guessing at random”. These tools have been deployed widely in industry and academia, with the tools used in medical applications, robotics, computer vision, astronomy, in the sciences, and by NASA. Beyond his academic work, Yarin works with industry on deploying robust ML tools safely and responsibly. He co-chairs the NASA FDL AI committee, and is an advisor with Canadian medical imaging company Imagia, Japanese robotics company Preferred Networks, as well as numerous startups.

Dec. 14, 2021, 11:30 a.m.

Amy Zhang

Dec. 14, 2021, 11:40 a.m.

Strategic classification concerns the problem of training a classifier that will ultimately observe data generated according to strategic agents’ responses. The commonly adopted setting is that the agents are fully rational and can best respond to a classifier, and the classifier is aiming to maximize its robustness to the strategic “manipulations”. This talk revisits a couple of dynamics concepts in the above formulation. The first question we try to revisit is: are all changes considered undesirable? We observe that in many application settings, changes in agents’ profile X can lead to true improvement in their target variable Y [1,2]. This observation requires us to revisit the objective function of the learner, and study the possibility of inducing an improved population from the agents. The second question we revisit is: do agents respond rationally? Inspired by evolutionary game theory, we introduce a dynamical agent response model using replicator dynamics to model agents’ potentially non-fully rational responses to a sequence of classifiers [3]. We characterize the dynamics of this model and offer observations of its fairness implication in such a long-term dynamical environment.


[1] Linear Classifiers that Encourage Constructive Adaptation, Yatong Chen, Jialu Wang and Yang Liu, 2021.

[2] Induced Domain Adaptation, Yang Liu, Yatong Chen, Jiaheng Wei, 2021.

[3] Unintended Selection: Persistent Qualification Rate Disparities and Interventions, Reilly Raab and Yang Liu, Neural Information Processing Systems (NeurIPS), 2021

Yang Liu

Dec. 14, 2021, noon

Differential Inference is the use of differentiation to perform probabilistic inference. The technique itself is relatively straightforward and plays nicely with autodiff: it roughly just automates Bayes' rule the way autodiff automates the chain rule. However, there is still a tendency for students to get tied up in the knots of even elementary probabilistic inference. Inspired by polemics that shined light on autodifferentiation, this talk will be half a tutorial on the use of differential inference and half a demonstration of all the fun math that it can remove from your life.

Alexander Rush

Alexander "Sasha" Rush is an Associate Professor at Cornell Tech and a researcher at Hugging Face. His research interest is in the study of language models with applications in controllable text generation, efficient inference, and applications in summarization and information extraction. In addition to research, he has written several popular open-source software projects supporting NLP research, programming for deep learning, and virtual academic conferences. His projects have received paper and demo awards at major NLP, visualization, and hardware conferences, an NSF Career Award and Sloan Fellowship. He tweets at @srush_nlp.

Dec. 14, 2021, noon

Roy Perlis

Dec. 14, 2021, 12:10 p.m.

J. Zico Kolter

Zico Kolter is an Assistant Professor in the School of Computer Science at Carnegie Mellon University, and also serves as Chief Scientist of AI Research for the Bosch Center for Artificial Intelligence. His work focuses on the intersection of machine learning and optimization, with a large focus on developing more robust, explainable, and rigorous methods in deep learning. In addition, he has worked on a number of application areas, highlighted by work on sustainability and smart energy systems. He is the recipient of the DARPA Young Faculty Award, and best paper awards at KDD, IJCAI, and PESGM.

Dec. 14, 2021, 12:10 p.m.

Tom Griffiths

Dec. 14, 2021, 12:15 p.m.

Animashree Anandkumar

Anima Anandkumar is a Bren Professor at Caltech and Director of ML Research at NVIDIA. She was previously a Principal Scientist at Amazon Web Services. She has received several honors such as Alfred. P. Sloan Fellowship, NSF Career Award, Young investigator awards from DoD, and Faculty Fellowships from Microsoft, Google, Facebook, and Adobe. She is part of the World Economic Forum's Expert Network. She is passionate about designing principled AI algorithms and applying them in interdisciplinary applications. Her research focus is on unsupervised AI, optimization, and tensor methods.

Dec. 14, 2021, 12:30 p.m.

Stochastic gradient algorithms are widely used for large-scale learning and inference problems. However, their use in practice is typically guided by heuristics and trial-and-error rather than rigorous, generalizable theory. We take a step toward better understanding the effect of the tuning parameters of these algorithms by characterizing the large-sample behavior of iterates of a very general class of preconditioned stochastic gradient algorithms with fixed step size, including stochastic gradient descent with and without additional Gaussian noise, momentum, and/or acceleration. We show that near a local optimum, the iterates converge weakly to paths of an Ornstein–Uhlenbeck process, and provide sufficient conditions for the stationary distributions of the finite-sample processes to converge weakly to that of the limiting process. In particular, with appropriate choices of tuning parameters, the limiting stationary covariance can match either the Bernstein–von Mises-limit of the posterior, adjustments to the posterior for model misspecification, or the asymptotic distribution of the maximum likelihood estimate – and that with a naive tuning, the limit corresponds to none of these. Moreover, we argue that, in the large-sample regime, an essentially independent sample from the stationary distribution can be obtained after a fixed number of passes over the dataset. Our results show that properly tuned stochastic gradient algorithms offer a practical approach to obtaining inferences that are computationally efficient and statistically robust.

Speaker Bio: Jonathan Huggins is an Assistant Professor in the Department of Mathematics & Statistics, a Data Science Faculty Fellow, and a Founding Member of the Faculty of Computing & Data Sciences at Boston University. Prior to joining BU, he was a Postdoctoral Research Fellow in the Department of Biostatistics at Harvard. He completed his Ph.D. in Computer Science at the Massachusetts Institute of Technology in 2018. Previously, he received a B.A. in Mathematics from Columbia University and an S.M. in Computer Science from the Massachusetts Institute of Technology. His research centers on the development of fast, trustworthy machine learning and AI methods that balance the need for computational efficiency and the desire for statistical optimality with the inherent imperfections that come from real-world problems, large datasets, and complex models. His current applied work is focused on methods to enable more effective scientific discovery from high-throughput and multi-modal genomic data.

Jonathan Huggins

Dec. 14, 2021, 12:37 p.m.

Despite major efforts in recent years to improve explainability of deep neural networks, the tools we use for communicating explanations have largely remained the same: visualizations of representative inputs, salient input regions, and local model approximations. But when humans describe complex decision rules, we often use a different explanatory tool: natural language. I'll describe recent work on explaining models for computer vision tasks by automatically constructing natural language descriptions of individual neurons. These descriptions ground prediction in meaningful perceptual and linguistic abstractions, and can be used to surface unexpected model behaviors, and identify and mitigate adversarial vulnerabilities. These results show that fine-grained, automatic annotation of deep network models is both possible and practical: rich, language-based explanations produced by automated annotation procedures can surface meaningful and actionable information about deep networks.

Jacob Andreas

Dec. 14, 2021, 12:40 p.m.

As algorithms are increasingly applied to screen applicants for high-stakes decisions in employment, education, lending, and other domains, concerns have been raised about the effects of "algorithmic monoculture", in which many decision-makers all rely on the same algorithm. This concern invokes analogies to agriculture, where a monocultural system runs the risk of severe harm from unexpected shocks. We present a set of basic models characterizing the potential risks from algorithmic monoculture, showing that monocultural convergence on a single algorithm by a group of decision-making agents, even when the algorithm is more accurate for any one agent in isolation, can reduce the overall quality of the decisions being made by the full collection of agents. Our results rely on minimal assumptions, and involve a combination of game-theoretic arguments about competing decision-makers with the development of a probabilistic framework for analyzing systems that use multiple noisy estimates of a set of alternatives. The talk is based on joint work with Manish Raghavan.

Jon Kleinberg

Dec. 14, 2021, 1:08 p.m.

Machine Learning algorithms often prompt individuals to strategically modify their observable attributes to receive more favorable predictions. As a result, the distribution the predictive model is trained on may differ from the one it operates on in deployment. While such distribution shifts, in general, hinder accurate predictions, our work identifies a unique opportunity associated with shifts due to strategic responses. We show that we can use strategic responses effectively to recover causal relationships between the observable features and outcomes we wish to predict. More specifically, we study a game-theoretic model in which a principal deploys a sequence of models to predict an outcome of interest (e.g., college GPA) for a sequence of strategic agents (e.g., college applicants). In response, strategic agents invest efforts and modify their features for better predictions. In such settings, unobserved confounding variables can influence both an agent's observable features (e.g., high school records) and outcomes. Therefore, standard regression methods generally produce biased estimators. To address this issue, our work establishes a novel connection between strategic responses to machine learning models and instrumental variable (IV) regression, by observing that the sequence of deployed models can be viewed as an instrument that affects agents' observable features but does not directly influence their outcomes. Therefore, two-stage least squares (2SLS) regression can recover the causal relationships between observable features and outcomes. Beyond causal recovery, we can build on our 2SLS method to address two additional relevant optimization objectives: agent outcome maximization and predictive risk minimization.

This work is joint with Keegan Harris, Daniel Ngo, Logan Stapleton, and Hoda Heidari.

Steven Wu

I am an Assistant Professor in the School of Computer Science at Carnegie Mellon University. My broad research interests are in algorithms and machine learning. These days I am excited about: - Foundations of responsible AI, with emphasis on privacy and fairness considerations. - Interactive learning, including contextual bandits and reinforcement learning, and its interactions with causal inference and econometrics. - Economic aspects of machine learning, with a focus on learning in the presence of strategic agents.

Dec. 14, 2021, 1:30 p.m.

To improve the efficiency of Monte Carlo estimation, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed. The reasoning is sound: a reduction in variance due to more rapid sampling can outweigh the bias introduced. However, the inexactness creates new challenges for sampler and parameter selection, since standard measures of sample quality like effective sample size do not account for asymptotic bias. To address these challenges, I'll describe how Stein's method -- a tool developed to prove central limit theorems -- can be adapted to assess and improve the quality of practical inference procedures. Along the way, I’ll highlight applications to Markov chain Monte Carlo sampler selection, goodness-of-fit testing, and black-box importance sampling.

Speaker Bio: Lester Mackey is a Principal Researcher at Microsoft Research, where he develops machine learning methods, models, and theory for large-scale learning tasks driven by applications from climate forecasting, healthcare, and the social good. Lester moved to Microsoft from Stanford University, where he was an assistant professor of Statistics and (by courtesy) of Computer Science. He earned his PhD in Computer Science and MA in Statistics from UC Berkeley and his BSE in Computer Science from Princeton University. He co-organized the second place team in the Netflix Prize competition for collaborative filtering, won the Prize4Life ALS disease progression prediction challenge, won prizes for temperature and precipitation forecasting in the yearlong real-time Subseasonal Climate Forecast Rodeo, and received best paper and best student paper awards from the ACM Conference on Programming Language Design and Implementation and the International Conference on Machine Learning.

Lester Mackey

Dec. 14, 2021, 1:31 p.m.

Across multiple industries, new online platforms are interjecting themselves as digital intermediaries in previously direct business-to-consumer transactions. A reasonable concern is that these platforms, once they become dominant, can leverage their unique position to extract surplus from both sides participating in the transaction and lead to different welfare outcomes than platform participants expected (and experienced) when they first joined the platform. We study the effects that OpenTable (an online restaurant reservation platform) had on restaurants’ prices and their likelihood of survival in NYC, during a period the platform expanded to cover most restaurants in the city. We develop an analytical model to understand restaurants’ adoption decision, and the effect of adoption on prices and consumer surplus. The model shows how the platform can induce a prisoner’s dilemma where restaurants have incentives to join the platform to poach customers from competitors or to protect its clientele from competitors. However, once all restaurants join, none of them will attract additional customers, and the costs of the platform will be passed down to diners through price increases. As the popularity of the platform grows, the platform can charge a higher fee to restaurants until extracting all the benefits it creates. To test the predictions of the model, we create a dataset containing prices, survival, and OpenTable participation for over 5,000 restaurants in NYC between 2005 and 2016. Our analysis suggests that as the platform became prevalent, the costs of the platform were passed down to consumers through price and restaurants saw no benefits in terms of survival.

Cristobal Cheyre

Dec. 14, 2021, 1:35 p.m.

Dec. 14, 2021, 2:05 p.m.

Variational inference has recently emerged as a popular alternative to Markov chain Monte Carlo (MCMC) in large-scale Bayesian inference. A core idea of variational inference is to trade statistical accuracy for computational efficiency. It aims to approximate the posterior, as opposed to targeting the exact posterior as in MCMC. Approximating the exact posterior by a restricted inferential model (a.k.a. variational approximating family) reduces computation costs but sacrifices its statistical accuracy. In this work, we develop a theoretical characterization of this statistical-computational tradeoff in variational inference. We focus on a case study of Bayesian linear regression using inferential models (a.k.a. variational approximating families) with different degrees of flexibility. From a computational perspective, we find that less flexible variational families speed up computation. They reduce the variance in stochastic optimization and in turn, accelerate convergence. From a statistical perspective, however, we find that less flexible families suffer in approximation quality, but provide better statistical generalization. This is joint work with Kush Bhatia, Nikki Kuang, and Yi-an Ma.

Speaker Bio: Yixin Wang is an LSA Collegiate Fellow in Statistics at the University of Michigan. She works in the fields of Bayesian statistics, machine learning, and causal inference. Previously, she was a postdoctoral researcher with Professor Michael Jordan at the University of California, Berkeley. She completed her PhD in statistics at Columbia, advised by Professor David Blei, and her undergraduate studies in mathematics and computer science at the Hong Kong University of Science and Technology. Her research has received several awards, including the INFORMS data mining best paper award, Blackwell-Rosenbluth Award from the junior section of ISBA, student paper awards from ASA Biometrics Section and Bayesian Statistics Section, and the ICSA conference young researcher award.

Yixin Wang

Invited Talk: Curtis Northcutt

Dec. 14, 2021, 2:45 p.m.

Curtis Northcutt

I completed my Ph.D. in Computer Science at MIT, where I was fortunate to work with Isaac Chuang. Before that, I was awarded the MIT Morris Joseph Levin Masters Thesis Award for my masters thesis work at MIT, the NSF Fellowship, and the MITx Digital Learning Research Fellowship. I also taught as a TA for MIT's graduate machine learning course (6.867). Before that, I graduated as valedictorian from Vanderbilt University (2009-2013) where I majored in mathematics and computer science and was awarded the Barry M. Goldwater National Scholarship.

My work spans the theory and applications of artificial intelligence including uncertainty quantification and augmenting human capabilities. I invented confident learning and cleanlab (1.5k+ stars on GitHub), the Python package for machine learning with noisy labels and finding label errors in datasets. Before that, I created the CAMEO cheating detection system used to validate certificates in MITx and HarvardX online course teams. I am grateful to have worked at many of the world's leading AI research groups, including Google AI, Oculus Research, Facebook AI Research, Amazon AI, Microsoft Research, NASA, MIT, and Harvard.

Working with Richard Newcombe, I created the first augmented reality dataset for multi-person conversational AI, EgoCom. Our associated T-PAMI paper uses the EgoCom dataset to predict turn-taking in conversations.

With friends from Harvard and MIT, I co-founded ChipBrain, an empathy AI company building digital brains. As CTO of ChipBrain, I lead our mission to build emotionally intelligent AI that helps anyone build better relationships and connect with their audience more deeply. We envision a world where people from different backgrounds can empathize with one another, whether it's solving an argument with a partner, selling a product to a customer, or asking for time off from your boss. You can learn more about ChipBrain in this interview.

In my spare time, I help researchers build affordable state-of-the-art deep learning machines and enjoy competitive mountaineering, hiking, and cycling.

My favorite rapper is PomDP the PhD rapper.

Dec. 14, 2021, 3:31 p.m.

Invited Talk: Anima Anandkumar

Dec. 14, 2021, 3:40 p.m.

Anima Anandkumar

Anima Anandkumar is a Bren professor at Caltech. Her research spans both theoretical and practical aspects of large-scale machine learning. In particular, she has spearheaded research in neural operators, tensor-algebraic methods, non-convex optimization, probabilistic models and deep learning.

Anima is the recipient of several awards and honors such as the Bren named chair professorship at Caltech, Alfred. P. Sloan Fellowship, Young investigator awards from the Air Force and Army research offices, Faculty fellowships from Microsoft, Google and Adobe, and several best paper awards.

Anima received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She was a postdoctoral researcher at MIT from 2009 to 2010, a visiting researcher at Microsoft Research New England in 2012 and 2014, an assistant professor at U.C. Irvine between 2010 and 2016, an associate professor at U.C. Irvine between 2016 and 2017 and a principal scientist at Amazon Web Services between 2016 and 2018.