Timezone: »
Benchmark datasets have played a crucial role in driving empirical progress in machine learning, leading to an interesting dynamic between those on a quest for state-of-the-art performance and those creating new challenging benchmarks. In this panel, we reflect on how benchmarks can lead to scientific progress, both in terms of new algorithmic innovations and improved scientific understanding. First, what qualities of a machine learning system should a good benchmark dataset seek to measure? How well can benchmarks assess performance in dynamic and novel environments, or in tasks with an open-ended set of acceptable answers? Benchmarks can also raise significant ethical concerns including poor data collection practices, under- and misrepresentation of subjects, as well as misspecification of objectives. Second, even given high-quality, carefully constructed benchmarks, which research questions can we hope to answer from leaderboard-climbing, and which ones are deprioritized or impossible to answer due to the limitations of the benchmark paradigm? In general, we hope to deepen the community’s awareness of the important role of benchmarks for advancing the science of machine learning.
Author Information
Lora Aroyo (Google Research)
Samuel Bowman (New York University)
Isabelle Guyon (U. Paris-Saclay & ChaLearn)
Isabelle Guyon recently joined Google Brain as a research scientist. She is also professor of artificial intelligence at Université Paris-Saclay (Orsay). Her areas of expertise include computer vision, bioinformatics, and power systems. She is best known for being a co-inventor of Support Vector Machines. Her recent interests are in automated machine learning, meta-learning, and data-centric AI. She has been a strong promoter of challenges and benchmarks, and is president of ChaLearn, a non-profit dedicated to organizing machine learning challenges. She is community lead of Codalab competitions, a challenge platform used both in academia and industry. She co-organized the “Challenges in Machine Learning Workshop” @ NeurIPS between 2014 and 2019, launched the "NeurIPS challenge track" in 2017 while she was general chair, and pushed the creation of the "NeurIPS datasets and benchmark track" in 2021, as a NeurIPS board member.
Joaquin Vanschoren (Eindhoven University of Technology)
More from the Same Authors
-
2022 : Fifteen-minute Competition Overview Video »
Dustin Carrión-Ojeda · Ihsan Ullah · Sergio Escalera · Isabelle Guyon · Felix Mohr · Manh Hung Nguyen · Joaquin Vanschoren -
2022 : LOTUS: Learning to learn with Optimal Transport in Unsupervised Scenarios »
prabhant singh · Joaquin Vanschoren -
2022 : Two-Turn Debate Does Not Help Humans Answer Hard Reading Comprehension Questions »
Alicia Parrish · Harsh Trivedi · Nikita Nangia · Jason Phang · Vishakh Padmakumar · Amanpreet Singh Saimbhi · Samuel Bowman -
2023 Poster: Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting »
Miles Turpin · Julian Michael · Ethan Perez · Samuel Bowman -
2023 Poster: DataPerf: Benchmarks for Data-Centric AI Development »
Mark Mazumder · Colby Banbury · Xiaozhe Yao · Bojan Karlaš · William Gaviria Rojas · Sudnya Diamos · Greg Diamos · Lynn He · Alicia Parrish · Hannah Rose Kirk · Jessica Quaye · Charvi Rastogi · Douwe Kiela · David Jurado · David Kanter · Rafael Mosquera · Will Cukierski · Juan Ciro · Lora Aroyo · Bilge Acun · Lingjiao Chen · Mehul Raje · Max Bartolo · Evan Sabri Eyuboglu · Amirata Ghorbani · Emmett Goodman · Addison Howard · Oana Inel · Tariq Kane · Christine R. Kirkpatrick · D. Sculley · Tzu-Sheng Kuo · Jonas Mueller · Tristan Thrush · Joaquin Vanschoren · Margaret Warren · Adina Williams · Serena Yeung · Newsha Ardalani · Praveen Paritosh · Ce Zhang · James Zou · Carole-Jean Wu · Cody Coleman · Andrew Ng · Peter Mattson · Vijay Janapa Reddi -
2023 Workshop: Socially Responsible Language Modelling Research (SoLaR) »
Usman Anwar · David Krueger · Samuel Bowman · Jakob Foerster · Su Lin Blodgett · Roberta Raileanu · Alan Chan · Katherine Lee · Laura Ruis · Robert Kirk · Yawen Duan · Xin Chen · Kawin Ethayarajh -
2023 Competition: NeurIPS 2023 Machine Unlearning Competition »
Eleni Triantafillou · Fabian Pedregosa · Meghdad Kurmanji · Kairan ZHAO · Gintare Karolina Dziugaite · Peter Triantafillou · Ioannis Mitliagkas · Vincent Dumoulin · Lisheng Sun · Peter Kairouz · Julio C Jacques Junior · Jun Wan · Sergio Escalera · Isabelle Guyon -
2022 : Sam Bowman: What's the deal with AI safety? »
Samuel Bowman -
2022 Competition: Cross-Domain MetaDL: Any-Way Any-Shot Learning Competition with Novel Datasets from Practical Domains »
Dustin Carrión-Ojeda · Ihsan Ullah · Sergio Escalera · Isabelle Guyon · Felix Mohr · Manh Hung Nguyen · Joaquin Vanschoren -
2022 Workshop: Human Evaluation of Generative Models »
Divyansh Kaushik · Jennifer Hsia · Jessica Huynh · Yonadav Shavit · Samuel Bowman · Ting-Hao Huang · Douwe Kiela · Zachary Lipton · Eric Michael Smith -
2022 Workshop: NeurIPS 2022 Workshop on Meta-Learning »
Huaxiu Yao · Eleni Triantafillou · Fabio Ferreira · Joaquin Vanschoren · Qi Lei -
2022 Poster: Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification »
Ihsan Ullah · Dustin Carrión-Ojeda · Sergio Escalera · Isabelle Guyon · Mike Huisman · Felix Mohr · Jan N. van Rijn · Haozhe Sun · Joaquin Vanschoren · Phan Anh Vu -
2022 : Isabelle Guyon »
Isabelle Guyon -
2022 Invited Talk: The Data-Centric Era: How ML is Becoming an Experimental Science »
Isabelle Guyon -
2021 : Invited talk 9 »
Samuel Bowman -
2021 Workshop: 5th Workshop on Meta-Learning »
Erin Grant · Fábio Ferreira · Frank Hutter · Jonathan Richard Schwarz · Joaquin Vanschoren · Huaxiu Yao -
2021 : MetaDL: Few Shot Learning Competition with Novel Datasets from Practical Domains + Q&A »
Adrian El Baz · Isabelle Guyon · Zhengying Liu · Jan N. Van Rijn · Haozhe Sun · Sébastien Treguer · Wei-Wei Tu · Ihsan Ullah · Joaquin Vanschoren · Phan Ahn Vu -
2020 Poster: Deep Statistical Solvers »
Balthazar Donon · Zhengying Liu · Wenzhuo LIU · Isabelle Guyon · Antoine Marot · Marc Schoenauer -
2019 : Welcome and Opening Remarks »
Adrienne Mendrik · Wei-Wei Tu · Isabelle Guyon · Evelyne Viegas · Ming LI -
2019 Poster: Can Unconditional Language Models Recover Arbitrary Sentences? »
Nishant Subramani · Samuel Bowman · Kyunghyun Cho -
2019 Poster: SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems »
Alex Wang · Yada Pruksachatkun · Nikita Nangia · Amanpreet Singh · Julian Michael · Felix Hill · Omer Levy · Samuel Bowman -
2019 Spotlight: SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems »
Alex Wang · Yada Pruksachatkun · Nikita Nangia · Amanpreet Singh · Julian Michael · Felix Hill · Omer Levy · Samuel Bowman -
2018 : Afternoon Welcome - Isabelle Guyon and Evelyne Viegas »
Isabelle Guyon -
2018 Workshop: CiML 2018 - Machine Learning competitions "in the wild": Playing in the real world or in real time »
Isabelle Guyon · Evelyne Viegas · Sergio Escalera · Jacob D Abernethy -
2018 : Datasets and Benchmarks for Causal Learning »
Csaba Szepesvari · Isabelle Guyon · Nicolai Meinshausen · David Blei · Elias Bareinboim · Bernhard Schölkopf · Pietro Perona -
2018 : AutoML3 - LifeLong ML with concept drift Challenge: Overview and award ceremony »
Hugo Jair Escalante · Isabelle Guyon · Daniel Silver · Evelyne Viegas · Wei-Wei Tu -
2018 : Evaluating Causation Coefficients »
Isabelle Guyon -
2017 Workshop: Machine Learning Challenges as a Research Tool »
Isabelle Guyon · Evelyne Viegas · Sergio Escalera · Jacob D Abernethy -
2017 : Introduction - Isabelle Guyon and Evelyne Viegas »
Isabelle Guyon -
2016 Workshop: Machine Learning for Spatiotemporal Forecasting »
Florin Popescu · Sergio Escalera · Xavier Baró · Stephane Ayache · Isabelle Guyon -
2016 : Gaming challenges and encouraging collaborations »
Sergio Escalera · Isabelle Guyon -
2016 Workshop: Challenges in Machine Learning: Gaming and Education »
Isabelle Guyon · Evelyne Viegas · Balázs Kégl · Ben Hamner · Sergio Escalera -
2016 Demonstration: Biometric applications of CNNs: get a job at "Impending Technologies"! »
Sergio Escalera · Isabelle Guyon · Baiyu Chen · Marc Quintana · Umut Güçlü · Yağmur Güçlütürk · Xavier Baró · Rob van Lier · Carlos Andujar · Marcel A. J. van Gerven · Bernhard E Boser · Luke Wang -
2015 Workshop: Challenges in Machine Learning (CiML 2015): "Open Innovation" and "Coopetitions" »
Isabelle Guyon · Evelyne Viegas · Ben Hamner · Balázs Kégl -
2014 Workshop: High-energy particle physics, machine learning, and the HiggsML data challenge (HEPML) »
Glen Cowan · Balázs Kégl · Kyle Cranmer · Gábor Melis · Tim Salimans · Vladimir Vava Gligorov · Daniel Whiteson · Lester Mackey · Wojciech Kotlowski · Roberto Díaz Morales · Pierre Baldi · Cecile Germain · David Rousseau · Isabelle Guyon · Tianqi Chen -
2014 Workshop: Challenges in Machine Learning workshop (CiML 2014) »
Isabelle Guyon · Evelyne Viegas · Percy Liang · Olga Russakovsky · Rinat Sergeev · Gábor Melis · Michele Sebag · Gustavo Stolovitzky · Jaume Bacardit · Michael S Kim · Ben Hamner -
2013 Workshop: NIPS 2013 Workshop on Causality: Large-scale Experiment Design and Inference of Causal Mechanisms »
Isabelle Guyon · Leon Bottou · Bernhard Schölkopf · Alexander Statnikov · Evelyne Viegas · james m robins -
2012 Demonstration: Gesture recognition with Kinect »
Isabelle Guyon -
2009 Workshop: Clustering: Science or art? Towards principled approaches »
Margareta Ackerman · Shai Ben-David · Avrim Blum · Isabelle Guyon · Ulrike von Luxburg · Robert Williamson · Reza Zadeh -
2009 Mini Symposium: Causality and Time Series Analysis »
Florin Popescu · Isabelle Guyon · Guido Nolte -
2009 Demonstration: Causality Workbench »
Isabelle Guyon -
2008 Workshop: Causality: objectives and assessment »
Isabelle Guyon · Dominik Janzing · Bernhard Schölkopf -
2007 Demonstration: CLOP: a Matlab Learning Object Package »
Amir Reza Saffari Azar Alamdari · Isabelle Guyon · Hugo Jair Escalante · Gökhan H Bakir · Gavin Cawley -
2006 Workshop: Multi-level Inference Workshop and Model Selection Game »
Isabelle Guyon