Data-Centric AI for reliable and responsible AI: from theory to practice
Mihaela van der Schaar · Isabelle Guyon · Nabeel Seedat

Mon Dec 11 11:45 AM -- 02:15 PM (PST) @ Ballroom A - C

Data-Centric AI has recently been raised as an important paradigm shift in machine learning and AI — placing the previously undervalued “data work’ at the center of AI development. This tutorial aims to illuminate the fundamentals of Data-Centric AI and articulate its transformative potential. We will explore the motivation behind the data-centric approach, highlighting the power to improve model performance, engender more trustworthy, fair, and unbiased AI systems, as well as discuss benchmarking from a data-centric perspective. Our examination extends to standardized documentation frameworks, exposing how they form the backbone of this new paradigm. The tutorial will cover state-of-the-art methodologies that underscore these areas, which we will contextualize around the high-stakes setting of healthcare. A focus of this tutorial is providing participants with an interactive and hands-on experience. To this end, we provide coding/software tools and resources, thereby enabling practical engagement. The panel discussion, with experts spanning diverse industries, will provide a dynamic platform for discourse, enabling a nuanced understanding of the implications and limitations of Data-Centric AI across different contexts. Ultimately, our goal is that participants gain a practical foundation in data-centric AI, such that they can use or contribute to Data-Centric AI research.

Author Information

Mihaela van der Schaar (University of Cambridge)
Isabelle Guyon (Google and ChaLearn)
Isabelle Guyon

Isabelle Guyon recently joined Google Brain as a research scientist. She is also professor of artificial intelligence at Université Paris-Saclay (Orsay). Her areas of expertise include computer vision, bioinformatics, and power systems. She is best known for being a co-inventor of Support Vector Machines. Her recent interests are in automated machine learning, meta-learning, and data-centric AI.  She has been a strong promoter of challenges and benchmarks, and is president of ChaLearn, a non-profit dedicated to organizing machine learning challenges. She is community lead of Codalab competitions, a challenge platform used both in academia and industry. She co-organized the “Challenges in Machine Learning Workshop” @ NeurIPS between 2014 and 2019, launched the "NeurIPS challenge track" in 2017 while she was general chair, and pushed the creation of the "NeurIPS datasets and benchmark track" in 2021, as a NeurIPS board member.

Nabeel Seedat (University of Cambridge)

