Timezone: »
The Datasets and Benchmarks track serves as a novel venue for high-quality publications, talks, and posters on highly valuable machine learning datasets and benchmarks, as well as a forum for discussions on how to improve dataset development. Datasets and benchmarks are crucial for the development of machine learning methods, but also require their own publishing and reviewing guidelines. For instance, datasets can often not be reviewed in a double-blind fashion, and hence full anonymization will not be required. On the other hand, they do require additional specific checks, such as a proper description of how the data was collected, whether they show intrinsic bias, and whether they will remain accessible.
Wed 8:00 a.m. - 8:10 a.m.
|
A Large-Scale Database for Graph Representation Learning
(
Oral
)
SlidesLive Video » With the rapid emergence of graph representation learning, the construction of new large-scale datasets are necessary to distinguish model capabilities and accurately assess the strengths and weaknesses of each technique. By carefully analyzing existing graph databases, we identify 3 critical components important for advancing the field of graph representation learning: (1) large graphs, (2) many graphs, and (3) class diversity. To date, no single graph database offers all of these desired properties. We introduce MalNet , the largest public graph database ever constructed, representing a large-scale ontology of malicious software function call graphs. MalNet contains over 1.2 million graphs, averaging over 15k nodes and 35k edges per graph, across a hierarchy of 47 types and 696 families. Compared to the popular REDDIT-12K database, MalNet offers 105x more graphs, 44x larger graphs on average, and 63x more classes. We provide a detailed analysis of MalNet, discussing its properties and provenance, along with the evaluation of state-of-the-art machine learning and graph neural network techniques. The unprecedented scale and diversity of MalNet offers exciting opportunities to advance the frontiers of graph representation learning--enabling new discoveries and research into imbalanced classification, explainability and the impact of class hardness. The database is publicly available at www.mal-net.org. |
Scott Freitas · Yuxiao Dong · Joshua Neil · Duen Horng Chau 🔗 |
Wed 8:10 a.m. - 8:20 a.m.
|
WRENCH: A Comprehensive Benchmark for Weak Supervision
(
Oral
)
SlidesLive Video » Recent Weak Supervision (WS) approaches have had widespread success in easing the bottleneck of labeling training data for machine learning by synthesizing labels from multiple potentially noisy supervision sources. However, proper measurement and analysis of these approaches remain a challenge. First, datasets used in existing works are often private and/or custom, limiting standardization. Second, WS datasets with the same name and base data often vary in terms of the labels and weak supervision sources used, a significant "hidden" source of evaluation variance. Finally, WS studies often diverge in terms of the evaluation protocol and ablations used. To address these problems, we introduce a benchmark platform, WRENCH, for thorough and standardized evaluation of WS approaches. It consists of 22 varied real-world datasets for classification and sequence tagging; a range of real, synthetic, and procedurally-generated weak supervision sources; and a modular, extensible framework for WS evaluation, including implementations for popular WS methods. We use WRENCH to conduct extensive comparisons over more than 120 method variants to demonstrate its efficacy as a benchmark platform. The code is available at https://github.com/JieyuZ2/wrench. |
Jieyu Zhang · Yue Yu · · Yujing Wang · Yaming Yang · Mao Yang · Alexander Ratner 🔗 |
Wed 8:20 a.m. - 8:30 a.m.
|
ATOM3D: Tasks on Molecules in Three Dimensions
(
Oral
)
SlidesLive Video » Computational methods that operate on three-dimensional molecular structure have the potential to solve important questions in biology and chemistry. In particular, deep neural networks have gained significant attention, but their widespread adoption in the biomolecular domain has been limited by a lack of either systematic performance benchmarks or a unified toolkit for interacting with molecular data. To address this, we present ATOM3D, a collection of both novel and existing benchmark datasets spanning several key classes of biomolecules. We implement several classes of three-dimensional molecular learning methods for each of these tasks and show that they consistently improve performance relative to methods based on one- and two-dimensional representations. The specific choice of architecture proves to be critical for performance, with three-dimensional convolutional networks excelling at tasks involving complex geometries, graph networks performing well on systems requiring detailed positional information, and the more recently developed equivariant networks showing significant promise. Our results indicate that many molecular problems stand to gain from three-dimensional molecular learning, and that there is potential for improvement on many tasks which remain underexplored. To lower the barrier to entry and facilitate further developments in the field, we also provide a comprehensive suite of tools for dataset processing, model training, and evaluation in our open-source atom3d Python package. All datasets are available for download from www.atom3d.ai. |
Raphael Townshend · Martin Vögele · Patricia Suriana · Alex Derry · Alexander Powers · Yianni Laloudakis · Sidhika Balachandar · Bowen Jing · Brandon Anderson · Stephan Eismann · Risi Kondor · Russ Altman · Ron Dror
|
Wed 8:30 a.m. - 8:40 a.m.
|
Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research
(
Oral
)
SlidesLive Video » Benchmark datasets play a central role in the organization of machine learning research. They coordinate researchers around shared research problems and serve as a measure of progress towards shared goals. Despite the foundational role benchmarking practices play in the field, relatively little attention has been paid to the dynamics of benchmark dataset use and resuse within and across machine learning subcommunities. In this work we dig into these dynamics, by studying how dataset usage patterns differ across different machine learning subcommunities and across time from 2015-2020. We find increasing concentration on fewer and fewer datasets within task communities, significant adoption of datasets from other tasks, and concentration across the field on datasets have been introduced by researchers situated within a small number of elite institutions. Our results have implications for scientific evaluation, AI ethics, and equity and access within the field. |
Bernard Koch · Emily Denton · Alex Hanna · Jacob G Foster 🔗 |
Wed 8:40 a.m. - 9:00 a.m.
|
Joint Q&A
(
Q&A
)
SlidesLive Video » |
🔗 |
Author Information
Joaquin Vanschoren (Eindhoven University of Technology)

Joaquin Vanschoren is Associate Professor in Machine Learning at the Eindhoven University of Technology. He holds a PhD from the Katholieke Universiteit Leuven, Belgium. His research focuses on understanding and automating machine learning, meta-learning, and continual learning. He founded and leads OpenML.org, a popular open science platform with over 250,000 users that facilitates the sharing and reuse of machine learning datasets and models. He is a founding member of the European AI networks ELLIS and CLAIRE, and an active member of MLCommons. He obtained several awards, including an Amazon Research Award, an ECMLPKDD Best Demo award, and the Dutch Data Prize. He was a tutorial speaker at NeurIPS 2018 and AAAI 2021, and gave over 30 invited talks. He co-initiated the NeurIPS Datasets and Benchmarks track and was NeurIPS Datasets and Benchmarks Chair from 2021 to 2023. He also co-organized the AutoML workshop series at ICML, and the Meta-Learning workshop series at NeurIPS. He is editor-in-chief of DMLR (part of JMLR), as well as an action editor for JMLR and machine learning moderator for ArXiv. He authored and co-authored over 150 scientific papers, as well as reference books on Automated Machine Learning and Meta-learning.
Serena Yeung (Stanford University)
More from the Same Authors
-
2021 : OpenML Benchmarking Suites »
Bernd Bischl · Giuseppe Casalicchio · Matthias Feurer · Pieter Gijsbers · Frank Hutter · Michel Lang · Rafael Gomes Mantovani · Jan van Rijn · Joaquin Vanschoren -
2021 : Variational Task Encoders for Model-Agnostic Meta-Learning »
Joaquin Vanschoren -
2021 : Open-Ended Learning Strategies for Learning Complex Locomotion Skills »
Joaquin Vanschoren -
2022 : DrML: Diagnosing and Rectifying Vision Models using Language »
Yuhui Zhang · Jeff Z. HaoChen · Shih-Cheng Huang · Kuan-Chieh Wang · James Zou · Serena Yeung -
2022 : Fifteen-minute Competition Overview Video »
Dustin Carrión-Ojeda · Ihsan Ullah · Sergio Escalera · Isabelle Guyon · Felix Mohr · Manh Hung Nguyen · Joaquin Vanschoren -
2022 : LOTUS: Learning to learn with Optimal Transport in Unsupervised Scenarios »
prabhant singh · Joaquin Vanschoren -
2022 : DrML: Diagnosing and Rectifying Vision Models using Language »
Yuhui Zhang · Jeff Z. HaoChen · Shih-Cheng Huang · Kuan-Chieh Wang · James Zou · Serena Yeung -
2023 Poster: LOVM: Language-Only Vision Model Selection »
Orr Zohar · Shih-Cheng Huang · Kuan-Chieh Wang · Serena Yeung -
2023 Poster: DataPerf: Benchmarks for Data-Centric AI Development »
Mark Mazumder · Colby Banbury · Xiaozhe Yao · Bojan Karlaš · William Gaviria Rojas · Sudnya Diamos · Greg Diamos · Lynn He · Alicia Parrish · Hannah Rose Kirk · Jessica Quaye · Charvi Rastogi · Douwe Kiela · David Jurado · David Kanter · Rafael Mosquera · Will Cukierski · Juan Ciro · Lora Aroyo · Bilge Acun · Lingjiao Chen · Mehul Raje · Max Bartolo · Evan Sabri Eyuboglu · Amirata Ghorbani · Emmett Goodman · Addison Howard · Oana Inel · Tariq Kane · Christine R. Kirkpatrick · D. Sculley · Tzu-Sheng Kuo · Jonas Mueller · Tristan Thrush · Joaquin Vanschoren · Margaret Warren · Adina Williams · Serena Yeung · Newsha Ardalani · Praveen Paritosh · Ce Zhang · James Zou · Carole-Jean Wu · Cody Coleman · Andrew Ng · Peter Mattson · Vijay Janapa Reddi -
2023 Poster: INSPECT: A Multimodal Dataset for Patient Outcome Prediction of Pulmonary Embolisms »
Shih-Cheng Huang · Zepeng Huo · Ethan Steinberg · Chia-Chun Chiang · Curtis Langlotz · Matthew Lungren · Serena Yeung · Nigam Shah · Jason Fries -
2022 : Towards better benchmarks for AutoML, meta-learning and continual learning in computer vision »
Joaquin Vanschoren -
2022 Competition: Cross-Domain MetaDL: Any-Way Any-Shot Learning Competition with Novel Datasets from Practical Domains »
Dustin Carrión-Ojeda · Ihsan Ullah · Sergio Escalera · Isabelle Guyon · Felix Mohr · Manh Hung Nguyen · Joaquin Vanschoren -
2022 Workshop: NeurIPS 2022 Workshop on Meta-Learning »
Huaxiu Yao · Eleni Triantafillou · Fabio Ferreira · Joaquin Vanschoren · Qi Lei -
2022 Poster: Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification »
Ihsan Ullah · Dustin Carrión-Ojeda · Sergio Escalera · Isabelle Guyon · Mike Huisman · Felix Mohr · Jan N. van Rijn · Haozhe Sun · Joaquin Vanschoren · Phan Anh Vu -
2022 Poster: Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning »
Victor Weixin Liang · Yuhui Zhang · Yongchan Kwon · Serena Yeung · James Zou -
2021 Workshop: Data Centric AI »
Andrew Ng · Lora Aroyo · Greg Diamos · Cody Coleman · Vijay Janapa Reddi · Joaquin Vanschoren · Carole-Jean Wu · Sharon Zhou · Lynn He -
2021 Workshop: 5th Workshop on Meta-Learning »
Erin Grant · Fábio Ferreira · Frank Hutter · Jonathan Richard Schwarz · Joaquin Vanschoren · Huaxiu Yao -
2021 Datasets and Benchmarks: Dataset and Benchmark Poster Session 4 »
Joaquin Vanschoren · Serena Yeung -
2021 Datasets and Benchmarks: Dataset and Benchmark Track 3 »
Joaquin Vanschoren · Serena Yeung -
2021 Datasets and Benchmarks: Dataset and Benchmark Symposium »
Joaquin Vanschoren · Serena Yeung -
2021 Datasets and Benchmarks: Dataset and Benchmark Poster Session 3 »
Joaquin Vanschoren · Serena Yeung -
2021 Panel: The Role of Benchmarks in the Scientific Progress of Machine Learning »
Lora Aroyo · Samuel Bowman · Isabelle Guyon · Joaquin Vanschoren -
2021 : MetaDL: Few Shot Learning Competition with Novel Datasets from Practical Domains + Q&A »
Adrian El Baz · Isabelle Guyon · Zhengying Liu · Jan N. Van Rijn · Haozhe Sun · Sébastien Treguer · Wei-Wei Tu · Ihsan Ullah · Joaquin Vanschoren · Phan Ahn Vu -
2021 Datasets and Benchmarks: Dataset and Benchmark Poster Session 2 »
Joaquin Vanschoren · Serena Yeung -
2021 Datasets and Benchmarks: Dataset and Benchmark Poster Session 1 »
Joaquin Vanschoren · Serena Yeung -
2021 Datasets and Benchmarks: Dataset and Benchmark Track 1 »
Joaquin Vanschoren · Serena Yeung -
2020 : Introduction for invited speaker, Louis Kirsch »
Joaquin Vanschoren -
2020 : Contributed Talk 1: Learning Hyperbolic Representations for Unsupervised 3D Segmentation »
Joy Hsu · Jeffrey Gu · Serena Yeung -
2020 Workshop: Meta-Learning »
Jane Wang · Joaquin Vanschoren · Erin Grant · Jonathan Richard Schwarz · Francesco Visin · Jeff Clune · Roberto Calandra -
2019 Workshop: Meta-Learning »
Roberto Calandra · Ignasi Clavera Gilaberte · Frank Hutter · Joaquin Vanschoren · Jane Wang -
2018 Workshop: NIPS 2018 Workshop on Meta-Learning »
Joaquin Vanschoren · Frank Hutter · Sachin Ravi · Jane Wang · Erin Grant -
2018 Tutorial: Automatic Machine Learning »
Frank Hutter · Joaquin Vanschoren -
2016 : OpenML in research and education »
Joaquin Vanschoren