NeurIPS 2021 Datasets and Benchmarks Track

The Datasets and Benchmarks track serves as a novel venue for high-quality publications, talks, and posters on highly valuable machine learning datasets and benchmarks, as well as a forum for discussions on how to improve dataset development. Datasets and benchmarks are crucial for the development of machine learning methods, but also require their own publishing and reviewing guidelines. For instance, datasets can often not be reviewed in a double-blind fashion, and hence full anonymization will not be required. On the other hand, they do require additional specific checks, such as a proper description of how the data was collected, whether they show intrinsic bias, and whether they will remain accessible. 

CRITERIA. We are aiming for an equally stringent review as the main conference, yet better suited to datasets and benchmarks. Submissions to this track will be reviewed according to a set of criteria and best practices specifically designed for datasets and benchmarks, as described below. Next to a scientific paper, authors should also submit supplementary materials such as detail on how the data was collected and organized, what kind of information it contains, how it should be used ethically and responsibly, as well as how it will be made available and maintained. 

RELATIONSHIP TO NEURIPS. Submissions to the track will be part of the main NeurIPS conference, presented alongside the main conference papers. Accepted papers will be officially published in associated proceedings clearly linked to, yet separate from, the NeurIPS proceedings. The proceedings will be called Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks and they will be hosted on the NeurIPS website next to the main NeurIPS proceedings. We will maintain a page on the NeurIPS website with all accepted datasets and additional information. 

SUBMISSIONS. There will be two deadlines this year, to allow near-continuous submission. You can freely submit to any of the two rounds. If you submit to the first and are not selected, resubmission in the second round is allowed as long as all major issues raised in the previous round are addressed. It is also still possible to submit datasets and benchmarks to the main conference (under the usual review process), but dual submission to both is not allowed (unless you retracted your paper from the main conference). Submission is single-blind, and the review process is open. However, only accepted papers will remain visible after the review phase, and the datasets themselves can be released at a later date. 

SCOPE. In addition to new datasets and benchmarks on new or existing datasets, we welcome submissions that detail advanced practices in data collection and curation that are of general interest even if the data itself cannot be shared. Data generators or reinforcement learning environments are also in scope. Frameworks for responsible dataset development, audits of existing datasets, identifying significant problems with existing datasets and their use, or systematic analyses of existing systems on novel datasets that yield important new insight are also in scope.

Read our blog post for more about why we started this track.


Important dates (timezones are anywhere on earth)

  • June 7 (extended from June 4): Deadline for 1st round of submissions on OpenReview. 

  • July 6-14: Author/reviewer discussions on OpenReview.

  • July 28: Author notification for the first round.


  • August 27: Deadline for 2nd round of submissions on OpenReview.

  • September 24 - October 1: Author/reviewer discussions on OpenReview.

  • October 6: Author notification for the second round.


Note: You can freely and independently submit to any of the two rounds. Please see the call described above.



Q: My work is in scope for this track but possibly also for the main conference. Where should I submit it?

A: This is ultimately your choice. Consider the main contribution of the submission and how it should be reviewed. If the main contribution is a new dataset, benchmark, or other work that falls into the scope of the track (see above), then it is ideally reviewed accordingly. As discussed in our blog post, the reviewing procedures of the main conference are focused on algorithmic advances, analysis, and applications, while the reviewing in this track is equally stringent but designed to properly assess datasets and benchmarks. Other, more practical considerations are that this track has single-blind reviewing (since anonymization is often impossible for hosted datasets) and intended audience, i.e., make your work more visible for people looking for datasets and benchmarks. 

Q: How will paper accepted to this track be cited?

A: As detailed above, accepted papers will appear in official proceedings hosted on the NeurIPS website, next to (yet separate from) the main conference proceedings. The official name will be Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks. 

Q: Do I need to submit an abstract beforehand?

A: No, it is a single submission. The abstract and the rest of the submission need to be submitted together.

Q: To which round should I submit? 

A: This is your choice. The two rounds are entirely separate. The main benefit of submitting to the first round is that accepted papers will be announced earlier (more visibility). Also, when a submission for the first round is rejected you can still resolve any major issues and resubmit to the second round. Submitting to the first round is not required if you intend to submit to the second round.



A submission consists of:

  • Submissions are limited to 9 content pages in NeurIPS format, including all figures and tables; additional pages containing the paper checklist, references, and acknowledgements are allowed. If your submission is accepted, you will be allowed an additional content page for the camera-ready version.

    • Please carefully follow the Latex template for this track when preparing proposals. We follow the NeurIPS format, but with the appropriate headings, and without hiding the names of the authors. Download the template as a bundle here.

    • Papers should be submitted via OpenReview (click to start your submission)

    • Reviewing is single-blind, hence the paper should not be anonymized.

    • During submission, you can add a public link to the dataset or benchmark data. If the dataset can only be released later, you must include instructions for reviewers on how to access the dataset. This can only be done after the first submission: after submission, there will be an 'add dataset or benchmark' button where you can leave information for reviewers. We highly recommend making the dataset publicly available immediately or before the start of the NeurIPS conference. In select cases, requiring solid motivation, the release date can be stretched up to a year after the submission deadline.

  • Submission introducing new datasets must include the following in the supplementary materials (as a separate PDF):

    • Dataset documentation and intended uses. Recommended documentation frameworks include datasheets for datasets, dataset nutrition labels, data statements for NLP, and accountability frameworks

    • URL to website/platform where the dataset/benchmark can be viewed and downloaded by the reviewers.

    • Author statement that they bear all responsibility in case of violation of rights, etc., and confirmation of the data license.

    • Hosting, licensing, and maintenance plan. The choice of hosting platform is yours, as long as you ensure access to the data (possibly through a curated interface) and will provide the necessary maintenance.

  • To ensure accessibility, we largely follow the NeurIPS guidelines for data submission, but also allowing more freedom for non-static datasets. The supplementary materials for datasets must include the following:

    • Links to access the dataset and its metadata. This can be hidden upon submission if the dataset is not yet publicly available but must be added in the camera-ready version. In select cases, e.g when the data can only be released at a later date, this can be added afterward. Simulation environments should link to (open source) code repositories.

    • The dataset itself should ideally use an open and widely used data format. Provide a detailed explanation on how the dataset can be read. For simulation environments, use existing frameworks or explain how they can be used.

    • Long-term preservation: It must be clear that the dataset will be available for a long time, either by uploading to a data repository or by explaining how the authors themselves will ensure this

    • Explicit license: Authors must choose a license, ideally a CC license for datasets, or an open source license for code (e.g. RL environments). An overview of licenses can be found here: 

    • Add structured metadata to a dataset's meta-data page using Web standards (like and DCAT): This allows it to be discovered and organized by anyone. A guide can be found here: If you use an existing data repository, this is often done automatically.

    • Highly recommended: a persistent dereferenceable identifier (e.g. a DOI minted by a data repository or a prefix on for datasets, or a code repository (e.g. GitHub, GitLab,...) for code. If this is not possible or useful, please explain why.

  • For benchmarks, the supplementary materials must ensure that all results are easily reproducible. Where possible, use a reproducibility framework such as the ML reproducibility checklist, or otherwise guarantee that all results can be easily reproduced, i.e. all necessary datasets, code, and evaluation procedures must be accessible and documented.

  • For papers introducing best practices in creating or curating datasets and benchmarks, the above supplementary materials are not required.

  • For all papers resubmitted from a previous round: a discussion must be included in the supplementary materials detailing how the new submission addresses all of the issues raised by reviewers in the previous round.

  • For papers resubmitted after being retracted from another venue: a brief discussion on the main concerns raised by previous reviewers and how you addressed them. You do not need to share the original reviews.



Reviewing will be single-blind. A datasets and benchmarks program committee will be formed, consisting of experts on machine learning, dataset curation, and ethics. We will ensure diversity in the program committee, both in terms of background as well as technical expertise (e.g., data, ML, data ethics, social science expertise). Each paper will be reviewed by the members of the committee. In select cases that are flagged by reviewers, an ethics review may be performed as well.

The review process will be open: papers and reviews will be publicly visible during the review phase to allow community feedback. They will be hidden again after the review phase, unless they are accepted or when authors opt-in. Authors can choose to keep the datasets themselves hidden until a later release date, as long as reviewers have access.

The factors that will be considered when evaluating papers include:

  • Utility and quality of the submission: Impact, originality, novelty, relevance to the NeurIPS community will all be considered. 

  • Completeness of the relevant documentation: For datasets, sufficient detail must be provided on how the data was collected and organized, what kind of information it contains, how it should be used ethically and responsibly, as well as how it will be made available and maintained. For benchmarks, best practices on reproducibility should be followed.

  • Accessibility and accountability: For datasets, there should be a convincing hosting, licensing, and maintenance plan.

Ethics and responsible use: Any ethical implications should be addressed and guidelines for responsible use should be provided where appropriate.

If you would like to become a reviewer for this track, please let us know at



The following committee will provide advice on the organization of the track over the coming years: Sergio Escalera, Isabelle Guyon, Neil Lawrence, Dina Machuve, Olga Russakovsky, Joaquin Vanschoren.



Joaquin Vanschoren, Eindhoven University of Technology

Serena Yeung, Stanford University

Maria Xenochristou, Stanford University (workflow master)