NeurIPS 2021 Datasets and Benchmarks Track

The Datasets and Benchmarks track serves as a novel venue for publications, talks, and posters on highly valuable machine learning datasets and benchmarks, as well as a forum for discussions on how to improve dataset development. Datasets and benchmarks are crucial for the development of machine learning methods, but also require their own publishing and reviewing guidelines. For instance, datasets can often not be reviewed in a double-blind fashion, and hence full anonymization will not be required. On the other hand, they do require additional specific checks, such as a proper description of how the data was collected, whether they show intrinsic bias, and whether they will remain accessible.

Submissions to this track will be reviewed according to a set of stringent criteria specifically designed for datasets and benchmarks, as described below. Next to a scientific paper, authors should also submit supplementary materials such as detail on how the data was collected and organized, what kind of information it contains, how it should be used ethically and responsibly, as well as how it will be made available and maintained. 

Submissions to the track will be part of the NeurIPS conference, presented alongside the main conference papers. They will be officially published in an associated journal, yet separate from the official NeurIPS proceedings. There will be two deadlines this year, to allow near-continuous submission. Resubmission in the next round is allowed as long as all major issues raised in the previous round are addressed. It is also still possible to submit datasets and benchmarks to the main conference (under the usual review process), but dual submission to both is not allowed.

In addition to new datasets and benchmarks on new or existing datasets, we welcome submissions that detail advanced practices in data collection and curation that are of general interest even if the data itself cannot be shared. Data generators or reinforcement learning environments are also in scope. Frameworks for responsible dataset development, audits of existing datasets, or systematic analyses of existing systems on novel datasets that yield important new insight are also in scope.

Read our blog post for more about why we started this track.


Important dates (anywhere on earth)

  • June 4: Deadline for 1st round of submissions on OpenReview.

  • July 5-13: Author/reviewer discussions on OpenReview.

  • July 17: Author notification for the first round. 


  • August 27: Deadline for 2nd round of submissions on OpenReview.

  • September 24 - October 1: Author/reviewer discussions on OpenReview.

  • October 6: Author notification for the second round.



A submission consists of:

  • Submissions are 9 pages long in NeurIPS format, including the paper checklist (not included in the 9-page limit).

    • Please carefully follow [the provided Latex template] when preparing proposals. We follow the NeurIPS format, but with the appropriate headings.

    • Papers should be submitted via [OpenReview].

    • Reviewing is single-blind, hence the paper should not be anonymized.

  • Submission introducing new datasets must include the following in the supplementary materials:

    • Dataset documentation and intended uses. Recommended documentation frameworks include datasheets for datasets, dataset nutrition labels, data statements for NLP, and accountability frameworks

    • URL to website/platform where the dataset/benchmark can be viewed and downloaded by the reviewers.

    • Author statement that they bear all responsibility in case of violation of rights, etc., and confirmation of the data license.

    • Hosting, licensing, and maintenance plan. The choice of hosting platform is yours, as long as you ensure access to the data (possibly through a curated interface) and will provide the necessary maintenance.

  • For benchmarks, the supplementary materials must ensure that all results are easily reproducible. Where possible, use a reproducibility framework such as the ML reproducibility checklist, or otherwise guarantee that all results can be easily reproduced, i.e. all necessary datasets, code, and evaluation procedures must be accessible and documented.

  • For papers introducing best practices in creating or curating datasets and benchmarks, the above supplementary materials are not required.

  • For all papers resubmitted from a previous round: a discussion must be included in the supplementary materials detailing how the new submission addresses all of the issues raised by reviewers in the previous round.



Reviewing will be single-blind. A datasets and benchmarks program committee will be formed, consisting of experts on machine learning, dataset curation, and ethics. We will ensure diversity in the program committee, both in terms of background as well as technical expertise (e.g., data, ML, data ethics, social science expertise). Each paper will be reviewed by the members of the committee. In select cases that are flagged by reviewers, an ethics review may be performed as well. The factors that will be considered when evaluating papers include:

  • Utility and quality of the submission: Impact, originality, novelty, relevance to the NeurIPS community will all be considered. 

  • Completeness of the relevant documentation: For datasets, sufficient detail must be provided on how the data was collected and organized, what kind of information it contains, how it should be used ethically and responsibly, as well as how it will be made available and maintained. For benchmarks, best practices on reproducibility should be followed.

  • Accessibility and accountability: For datasets, there should be a convincing hosting, licensing, and maintenance plan.

Ethics and responsible use: Any ethical implications should be addressed and guidelines for responsible use should be provided where appropriate.

If you would like to become a reviewer for this track, please let us know at



The following committee will provide advice on the organization of the track over the coming years: Sergio Escalera, Isabelle Guyon, Neil Lawrence, Dina Machuve, Olga Russakovsky, Joaquin Vanschoren.



Joaquin Vanschoren, Eindhoven University of Technology

Serena Yeung, Stanford University