Timezone: »

Invited Talk
The Data-Centric Era: How ML is Becoming an Experimental Science
Isabelle Guyon

Thu Dec 01 07:30 AM -- 08:30 AM (PST) @ Hall H

NeurIPS has been in existence for more than 3 decades, each one marked by a dominant trend. The pioneering years saw the burgeoning of back-prop nets, the coming-of-age years blossomed with convex optimization, regularization, Bayesian methods, boosting, kernel methods, to name a few, and the junior years have been dominated by deep nets and big data. And now, recent analyses conclude that using ever bigger data and deeper networks is not a sustainable way of progressing. Meanwhile, other indicators show that Machine Learning is increasingly reliant upon good data and benchmarks, not only to train more powerful and/or more compact models, but also to soundly evaluate new ideas and to stress test models on their reliability, fairness, and protection against various attacks, including privacy attacks.

Simultaneously, in 2021, the NeurIPS Dataset and Benchmark track was launched and the Data-Centric AI initiative was born. This kickstarted the "data-centric era". It is gaining momentum in response to the new needs of data scientists who, admittedly, spend more time on understanding problems, designing experimental settings, and engineering datasets, than on designing and training ML models.

We will retrace the enormous collective efforts made by our community since the 1980's to share datasets and benchmarks, putting forward important milestones that led us to today's effervescence. We will pick a few hot topics that have raised controversy and have engendered novel thought-provoking contributions. Finally, we will highlight some of the most pressing issues that must be addressed by the community.

Author Information

Isabelle Guyon (Google and ChaLearn)

Isabelle Guyon recently joined Google Brain as a research scientist. She is also professor of artificial intelligence at Université Paris-Saclay (Orsay). Her areas of expertise include computer vision, bioinformatics, and power systems. She is best known for being a co-inventor of Support Vector Machines. Her recent interests are in automated machine learning, meta-learning, and data-centric AI. She has been a strong promoter of challenges and benchmarks, and is president of ChaLearn, a non-profit dedicated to organizing machine learning challenges. She is community lead of Codalab competitions, a challenge platform used both in academia and industry. She co-organized the “Challenges in Machine Learning Workshop” @ NeurIPS between 2014 and 2019, launched the "NeurIPS challenge track" in 2017 while she was general chair, and pushed the creation of the "NeurIPS datasets and benchmark track" in 2021, as a NeurIPS board member.

More from the Same Authors