Skip to yearly menu bar Skip to main content

Datasets and Benchmarks

Dataset and Benchmark Poster Session 3

Joaquin Vanschoren · Serena Yeung

2021 Datasets and Benchmarks

Abstract

The Datasets and Benchmarks track serves as a novel venue for high-quality publications, talks, and posters on highly valuable machine learning datasets and benchmarks, as well as a forum for discussions on how to improve dataset development. Datasets and benchmarks are crucial for the development of machine learning methods, but also require their own publishing and reviewing guidelines. For instance, datasets can often not be reviewed in a double-blind fashion, and hence full anonymization will not be required. On the other hand, they do require additional specific checks, such as a proper description of how the data was collected, whether they show intrinsic bias, and whether they will remain accessible.

Chat is not available.

Schedule

Timezone: America/Los_Angeles

Q-Pain: A Question Answering Dataset to Measure Social Bias in Pain Management

Cécile Logé · Emily Ross · David Dadey · Saahil Jain · Adriel Saporta · Andrew Ng · Pranav Rajpurkar

Video

Modeling Worlds in Text

Prithviraj Ammanabrolu · Mark Riedl

Video

OmniPrint: A Configurable Printed Character Synthesizer

Haozhe Sun · Wei-Wei Tu · Isabelle Guyon

Video

Benchmarking Bias Mitigation Algorithms in Representation Learning through Fairness Metrics

Charan Reddy · Deepak Sharma · Soroush Mehri · Adriana Romero Soriano · Samira Shabanian · Sina Honari

Video

An Extensible Benchmark Suite for Learning to Simulate Physical Systems

Karl Otness · Arvi Gjoka · Joan Bruna · Daniele Panozzo · Benjamin Peherstorfer · Teseo Schneider · Denis Zorin

Video

The Multi-Agent Behavior Dataset: Mouse Dyadic Social Interactions

Jennifer J Sun · Tomomi Karigo · Dipam Chakraborty · Sharada Mohanty · Benjamin Wild · Quan Sun · Chen Chen · David Anderson · Pietro Perona · Yisong Yue · Ann Kennedy

Video

Reinforcement Learning Benchmarks for Traffic Signal Control

James Ault · Guni Sharon

Video

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

Mikayel Samvelyan · Robert Kirk · Vitaly Kurin · Jack Parker-Holder · Minqi Jiang · Eric Hambro · Fabio Petroni · Heinrich Kuttler · Edward Grefenstette · Tim Rocktäschel

Video

Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks

Georgios Papoudakis · Filippos Christianos · Lukas Schäfer · Stefano Albrecht

Video

Which priors matter? Benchmarking models for learning latent dynamics

Aleksandar Botev · Andrew Jaegle · Peter Wirnsberger · Daniel Hennes · Irina Higgins

Video

The Neural MMO Platform for Massively Multiagent Research

Joseph Suarez · Yilun Du · Clare Zhu · Igor Mordatch · Phillip Isola

Video

A Procedural World Generation Framework for Systematic Evaluation of Continual Learning

Timm Hess · Martin Mundt · Iuliia Pliushch · Visvanathan Ramesh

Video

Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation

Daniel Freeman · Erik Frey · Anton Raichuk · Sertan Girgin · Igor Mordatch · Olivier Bachem

Video

CCNLab: A Benchmarking Framework for Computational Cognitive Neuroscience

Nikhil Bhattasali · Momchil Tomov · Samuel J Gershman

Video

Addressing "Documentation Debt" in Machine Learning: A Retrospective Datasheet for BookCorpus

John Bandy · Nicholas Vincent

Video

Generating Datasets of 3D Garments with Sewing Patterns

Maria Korosteleva · Sung-Hee Lee

Video

Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing

Sarah Wiegreffe · Ana Marasovic

Video

B-Pref: Benchmarking Preference-Based Reinforcement Learning

Kimin Lee · Laura Smith · Anca Dragan · Pieter Abbeel

Video

Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks

Curtis Northcutt · Anish Athalye · Jonas Mueller

Video

CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

Alon Talmor · Ori Yoran · Ronan Le Bras · Chandra Bhagavatula · Yoav Goldberg · Yejin Choi · Jonathan Berant

Video

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Cameron Voloshin · Hoang Le · Nan Jiang · Yisong Yue

ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation

Chuang Gan · Jeremy Schwartz · Seth Alter · Damian Mrowca · Martin Schrimpf · James Traer · Julian De Freitas · Jonas Kubilius · Abhishek Bhandwaldar · Nick Haber · Megumi Sano · Kuno Kim · Elias Wang · Michael Lingelbach · Aidan Curtis · Kevin Feigelis · Daniel Bear · Dan Gutfreund · David Cox · Antonio Torralba · James J DiCarlo · Josh Tenenbaum · Josh McDermott · Dan Yamins

Video

Physion: Evaluating Physical Prediction from Vision in Humans and Machines

Daniel Bear · Elias Wang · Damian Mrowca · Felix Binder · Hsiao-Yu Tung · Pramod RT · Cameron Holdaway · Sirui Tao · Kevin Smith · Fan-Yun Sun · Fei-Fei Li · Nancy Kanwisher · Josh Tenenbaum · Dan Yamins · Judith Fan

Video

CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms

Martin Pawelczyk · Sascha Bielawski · Johan Van den Heuvel · Tobias Richter · Gjergji Kasneci

Video

It's COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks

Michelle Bao · Angela Zhou · Samantha Zottola · Brian Brubach · Sarah Desmarais · Aaron Horowitz · Kristian Lum · Suresh Venkatasubramanian

Video

Automatic Construction of Evaluation Suites for Natural Language Generation Datasets

Simon Mille · Kaustubh Dhole · Saad Mahamood · Laura Perez-Beltrachini · Varun Prashant Gangal · Mihir Kale · Emiel van Miltenburg · Sebastian Gehrmann

Video

Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research

Bernard Koch · Emily Denton · Alex Hanna · Jacob G Foster

Video

Dynamic Environments with Deformable Objects

Rika Antonova · peiyang shi · Hang Yin · Zehang Weng · Danica Kragic

Video

An Empirical Investigation of Representation Learning for Imitation

Cynthia Chen · Sam Toyer · Cody Wild · Scott Emmons · Ian Fischer · Kuang-Huei Lee · Neel Alex · Steven Wang · Ping Luo · Stuart Russell · Pieter Abbeel · Rohin Shah

Video

OpenML Benchmarking Suites

Bernd Bischl · Giuseppe Casalicchio · Matthias Feurer · Pieter Gijsbers · Frank Hutter · Michel Lang · Rafael Gomes Mantovani · Jan van Rijn · Joaquin Vanschoren

Video

Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning

Nan Rosemary Ke · Aniket Didolkar · Sarthak Mittal · Anirudh Goyal · Guillaume Lajoie · Stefan Bauer · Danilo Jimenez Rezende · Yoshua Bengio · Chris Pal · Michael Mozer

RB2: Robotic Manipulation Benchmarking with a Twist

Sudeep Dasari · Jianren Wang · Joyce Hong · Shikhar Bahl · Yixin Lin · Austin Wang · Abitha Thankaraj · Karanbir Chahal · Berk Calli · Saurabh Gupta · David Held · Lerrel Pinto · Deepak Pathak · Vikash Kumar · Abhinav Gupta

Video

Really Doing Great at Estimating CATE? A Critical Look at ML Benchmarking Practices in Treatment Effect Estimation

Alicia Curth · David Svensson · Jim Weatherall · Mihaela van der Schaar

Video

Chest ImaGenome Dataset for Clinical Reasoning

Joy T Wu · Nkechinyere Agu · Ismini Lourentzou · Arjun Sharma · Joseph Alexander Paguio · Jasper Seth Yao · Edward C Dee · William Mitchell · Satyananda Kashyap · Andrea Giovannini · Leo Anthony Celi · Mehdi Moradi

Video

Mitigating dataset harms requires stewardship: Lessons from 1000 papers

Kenneth Peng · Arunesh Mathur · Arvind Narayanan

Video

Artsheets for Art Datasets

Ramya Srinivasan · Emily Denton · Jordan Famularo · Negar Rostamzadeh · Fernando Diaz · Beth Coleman

Video

An Empirical Study of Graph Contrastive Learning

Yanqiao Zhu · Yichen Xu · Qiang Liu · Shu Wu

Video

Monash Time Series Forecasting Archive

Rakshitha W Godahewa · Christoph Bergmeir · Geoffrey Webb · Rob Hyndman · Pablo Montero-Manso

Video

Synthetic Benchmarks for Scientific Research in Explainable Machine Learning

Yang Liu · Sujay Khandagale · Colin White · Willie Neiswanger

Video

A Toolbox for Construction and Analysis of Speech Datasets

Evelina Bakhturina · Vitaly Lavrukhin · Boris Ginsburg

Video

Evaluating Bayes Error Estimators on Real-World Datasets with FeeBee

Cedric Renggli · Luka Rimanic · Nora Hollenstein · Ce Zhang

Video

Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents

Jane Wang · Michael King · Nicolas Porcel · Zeb Kurth-Nelson · Tina Zhu · Charles Deck · Peter Choy · Mary Cassin · Malcolm Reynolds · Francis Song · Gavin Buttimore · David Reichert · Neil Rabinowitz · Loic Matthey · Demis Hassabis · Alexander Lerchner · Matt Botvinick

Video

FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark

Mingjie Li · Wenjia Cai · Rui Liu · Yuetian Weng · Xiaoyun Zhao · Cong Wang · Xin Chen · Zhong Liu · Caineng Pan · Mengke Li · yingfeng zheng · Yizhi Liu · Flora Salim · Karin Verspoor · Xiaodan Liang · Xiaojun Chang

Video

An Information Retrieval Approach to Building Datasets for Hate Speech Detection

Md Mustafizur Rahman · Dinesh Balakrishnan · Dhiraj Murthy · Mucahid Kutlu · Matt Lease

Video

Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation

Yuta Saito · Shunsuke Aihara · Megumi Matsutani · Yusuke Narita

Video

ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations

Tongzhou Mu · Zhan Ling · Fanbo Xiang · Derek Yang · Xuanlin Li · Stone Tao · Zhiao Huang · Zhiwei Jia · Hao Su

Video

AI and the Everything in the Whole Wide World Benchmark

Deborah Raji · Emily Denton · Emily M. Bender · Alex Hanna · Amandalynne Paullada

Video

Are We Learning Yet? A Meta Review of Evaluation Failures Across Machine Learning

Thomas Liao · Rohan Taori · Deborah Raji · Ludwig Schmidt

Video

Isaac Gym: High Performance GPU Based Physics Simulation For Robot Learning

Viktor Makoviychuk · Lukasz Wawrzyniak · Yunrong Guo · Michelle Lu · Kier Storey · Miles Macklin · David Hoeller · Nikita Rudin · Arthur Allshire · Ankur Handa · Gavriel State

Video

Hardware Design and Accurate Simulation of Structured-Light Scanning for Benchmarking of 3D Reconstruction Algorithms

Sebastian Koch · Yurii Piadyk · Markus Worchel · Marc Alexa · Claudio Silva · Denis Zorin · Daniele Panozzo

Video

The Medkit-Learn(ing) Environment: Medical Decision Modelling through Simulation

Alex Chan · Ioana Bica · Alihan Hüyük · Daniel Jarrett · Mihaela van der Schaar

Video

URLB: Unsupervised Reinforcement Learning Benchmark

Misha Laskin · Denis Yarats · Hao Liu · Kimin Lee · Albert Zhan · Kevin Lu · Catherine Cang · Lerrel Pinto · Pieter Abbeel

Video

What Would Jiminy Cricket Do? Towards Agents That Behave Morally

Dan Hendrycks · Mantas Mazeika · Andy Zou · Sahil Patel · Christine Zhu · Jesus Navarro · Dawn Song · Bo Li · Jacob Steinhardt

Video