Skip to yearly menu bar Skip to main content


NeurIPS 2024 Datasets and Benchmarks Track

 

If you'd like to become a reviewer for the track, or recommend someone, please use this form.

 

The Datasets and Benchmarks track serves as a venue for high-quality publications, talks, and posters on highly valuable machine learning datasets and benchmarks, as well as a forum for discussions on how to improve dataset development. Datasets and benchmarks are crucial for the development of machine learning methods, but also require their own publishing and reviewing guidelines. For instance, datasets can often not be reviewed in a double-blind fashion, and hence full anonymization will not be required. On the other hand, they do require additional specific checks, such as a proper description of how the data was collected, whether they show intrinsic bias, and whether they will remain accessible.

The previous editions of the Datasets and Benchmarks track were highly successful; you can view the accepted papers from 2021, 2002, and 2023, and the winners of the best paper awards 2021, 2022 and 2023

CRITERIA. We are aiming for an equally stringent review as the main conference, yet better suited to datasets and benchmarks. Submissions to this track will be reviewed according to a set of criteria and best practices specifically designed for datasets and benchmarks, as described below. A key criterion is accessibility: datasets should be available and accessible, i.e. the data can be found and obtained without a personal request to the PI, and any required code should be open source. We encourage the authors to use Croissant format (https://mlcommons.org/working-groups/data/croissant/) to document their datasets in machine readable way. Next to a scientific paper, authors should also submit supplementary materials such as detail on how the data was collected and organised, what kind of information it contains, how it should be used ethically and responsibly, as well as how it will be made available and maintained.

RELATIONSHIP TO NeurIPS. Submissions to the track will be part of the main NeurIPS conference, presented alongside the main conference papers. Accepted papers will be officially published in the NeurIPS proceedings.

SUBMISSIONS. There will be one deadline this year. It is also still possible to submit datasets and benchmarks to the main conference (under the usual review process), but dual submission to both is not allowed (unless you retracted your paper from the main conference). We also cannot transfer papers from the main track to the D&B track. Authors can choose to submit either single-blind or double-blind. If it is possible to properly review the submission double-blind, i.e., reviewers do not need access to non-anonymous repositories to review the work, then authors can also choose to submit the work anonymously. Papers will not be publicly visible during the review process. Only accepted papers will become visible afterward. The reviews themselves are not visible during the review phase but will be published after decisions have been made. The datasets themselves should be accessible to reviewers but can be publicly released at a later date (see below). New authors cannot be added after the abstract deadline and they should have an OpenReview profile by the paper deadline. NeurIPS does not tolerate any collusion whereby authors secretly cooperate with reviewers, ACs or SACs to obtain favourable reviews.

SCOPE. This track welcomes all work on data-centric machine learning research (DMLR), covering ML datasets and benchmarks as well as algorithms, tools, methods, and analyses for working with ML data. This includes but is not limited to:

  • New datasets, or carefully and thoughtfully designed (collections of) datasets based on previously available data.
  • Data generators and reinforcement learning environments.
  • Data-centric AI methods and tools, e.g. to measure and improve data quality or utility, or studies in data-centric AI that bring important new insight.
  • Advanced practices in data collection and curation that are of general interest even if the data itself cannot be shared.
  • Frameworks for responsible dataset development, audits of existing datasets, identifying significant problems with existing datasets and their use
  • Benchmarks on new or existing datasets, as well as benchmarking tools.
  • In-depth analyses of machine learning challenges and competitions (by organisers and/or participants) that yield important new insight.
  • Systematic analyses of existing systems on novel datasets yielding important new insight.

Read our original blog post for more about why we started this track.

 

Important dates

  • Abstract submission deadline: May 29, 2024
  • Full paper submission and co-author registration deadline: Jun 5, 2024
  • Supplementary materials submission deadline: Jun 12, 2024
  • Review deadline - Jul 24, 2024
  • Release of reviews and start of Author discussions on OpenReview: Aug 07, 2024
  • End of author/reviewer discussions on OpenReview: Aug 31, 2024
  • Author notification: Sep 26, 2024
  • Camera-ready deadline: Oct 30, 2024 AOE

Note: The site will start accepting submissions on April 15, 2024.

 

FREQUENTLY ASKED QUESTIONS

Q: My work is in scope for this track but possibly also for the main conference. Where should I submit it?

A: This is ultimately your choice. Consider the main contribution of the submission and how it should be reviewed. If the main contribution is a new dataset, benchmark, or other work that falls into the scope of the track (see above), then it is ideally reviewed accordingly. As discussed in our blog post, the reviewing procedures of the main conference are focused on algorithmic advances, analysis, and applications, while the reviewing in this track is equally stringent but designed to properly assess datasets and benchmarks. Other, more practical considerations are that this track allows single-blind reviewing (since anonymization is often impossible for hosted datasets) and intended audience, i.e., make your work more visible for people looking for datasets and benchmarks.

Q: How will paper accepted to this track be cited?

A: Accepted papers will appear as part of the official NeurIPS proceedings.

Q: Do I need to submit an abstract beforehand?

A: Yes, please check the important dates section for more information.

Q: My dataset requires open credentialized access. Can I submit to this track?

A: This will be possible on the condition that a credentialization is necessary for the public good (e.g. because of ethically sensitive medical data), and that an established credentialization procedure is in place that is 1) open to a large section of the public, 2) provides rapid response and access to the data, and 3) is guaranteed to be maintained for many years. A good example here is PhysioNet Credentialing, where users must first understand how to handle data with human subjects, yet is open to anyone who has learned and agrees with the rules. This should be seen as an exceptional measure, and NOT as a way to limit access to data for other reasons (e.g. to shield data behind a Data Transfer Agreement). Misuse would be grounds for desk rejection. During submission, you can indicate that your dataset involves open credentialized access, in which case the necessity, openness, and efficiency of the credentialization process itself will also be checked.

 

SUBMISSION INSTRUCTIONS

A submission consists of:

  • Submissions are limited to 9 content pages in NeurIPS format, including all figures and tables; additional pages containing the required paper checklist (included in the template), references, and acknowledgements are allowed. If your submission is accepted, you will be allowed an additional content page for the camera-ready version.
    • Please carefully follow the Latex template for this track when preparing proposals. We follow the NeurIPS format, but with the appropriate headings, and without hiding the names of the authors. Download the template as a bundle here.
    • Papers should be submitted via OpenReview
    • Reviewing is in principle single-blind, hence the paper should not be anonymized. In cases where the work can be reviewed equally well anonymously, anonymous submission is also allowed.
    • During submission, you can add a public link to the dataset or benchmark data. If the dataset can only be released later, you must include instructions for reviewers on how to access the dataset. This can only be done after the first submission by sending an official note to the reviewers in OpenReview. We highly recommend making the dataset publicly available immediately or before the start of the NeurIPS conference. In select cases, requiring solid motivation, the release date can be stretched up to a year after the submission deadline.
  • Submission introducing new datasets must include the following in the supplementary materials (as a separate PDF):
    • Dataset documentation and intended uses. Recommended documentation frameworks include datasheets for datasets, dataset nutrition labels, data statements for NLP, data cards, and accountability frameworks.
    • URL to website/platform where the dataset/benchmark can be viewed and downloaded by the reviewers. URL to Croissant metadata record documenting the dataset/benchmark available for viewing and downloading by the reviewers. You can create your Croissant metadata using e.g. the Python library available here: https://github.com/mlcommons/croissant
    • Author statement that they bear all responsibility in case of violation of rights, etc., and confirmation of the data license.
    • Hosting, licensing, and maintenance plan. The choice of hosting platform is yours, as long as you ensure access to the data (possibly through a curated interface) and will provide the necessary maintenance.
  • To ensure accessibility, we largely follow the NeurIPS guidelines for data submission, but also allowing more freedom for non-static datasets. The supplementary materials for datasets must include the following:
    • Links to access the dataset and its metadata. This can be hidden upon submission if the dataset is not yet publicly available but must be added in the camera-ready version. In select cases, e.g when the data can only be released at a later date, this can be added afterward (up to a year after the submission deadline). Simulation environments should link to open source code repositories
    • The dataset itself should ideally use an open and widely used data format. Provide a detailed explanation on how the dataset can be read. For simulation environments, use existing frameworks or explain how they can be used.
    • Long-term preservation: It must be clear that the dataset will be available for a long time, either by uploading to a data repository or by explaining how the authors themselves will ensure this
    • Explicit license: Authors must choose a license, ideally a CC license for datasets, or an open source license for code (e.g. RL environments). An overview of licenses can be found here: https://paperswithcode.com/datasets/license
    • Add structured metadata to a dataset's meta-data page using Web standards (like schema.org and DCAT): This allows it to be discovered and organized by anyone. A guide can be found here: https://developers.google.com/search/docs/data-types/dataset. If you use an existing data repository, this is often done automatically.
    • Highly recommended: a persistent dereferenceable identifier (e.g. a DOI minted by a data repository or a prefix on identifiers.org) for datasets, or a code repository (e.g. GitHub, GitLab,...) for code. If this is not possible or useful, please explain why.
  • For benchmarks, the supplementary materials must ensure that all results are easily reproducible. Where possible, use a reproducibility framework such as the ML reproducibility checklist, or otherwise guarantee that all results can be easily reproduced, i.e. all necessary datasets, code, and evaluation procedures must be accessible and documented.
  • For papers introducing best practices in creating or curating datasets and benchmarks, the above supplementary materials are not required.
  • For papers resubmitted after being retracted from another venue: a brief discussion on the main concerns raised by previous reviewers and how you addressed them. You do not need to share the original reviews.
  • For the dual submission and archiving, the policy follows the NeurIPS main track paper guideline.

Use of Large Language Models (LLMs): We welcome authors to use any tool that is suitable for preparing high-quality papers and research. However, we ask authors to keep in mind two important criteria. First, we expect papers to fully describe their methodology, and any tool that is important to that methodology, including the use of LLMs, should be described also. For example, authors should mention tools (including LLMs) that were used for data processing or filtering, visualization, facilitating or running experiments, and proving theorems. It may also be advisable to describe the use of LLMs in implementing the method (if this corresponds to an important, original, or non-standard component of the approach). Second, authors are responsible for the entire content of the paper, including all text and figures, so while authors are welcome to use any tool they wish for writing the paper, they must ensure that all text is correct and original.

 

REVIEWING AND SELECTION PROCESS

Reviewing will be single-blind, although authors can also submit anonymously if the submission allows that. A datasets and benchmarks program committee will be formed, consisting of experts on machine learning, dataset curation, and ethics. We will ensure diversity in the program committee, both in terms of background as well as technical expertise (e.g., data, ML, data ethics, social science expertise). Each paper will be reviewed by the members of the committee. In select cases where ethical concerns are flagged by reviewers, an ethics review may be performed as well.

Papers will not be publicly visible during the review process. Only accepted papers will become visible afterward. The reviews themselves are also not visible during the review phase but will be published after decisions have been made. Authors can choose to keep the datasets themselves hidden until a later release date, as long as reviewers have access.

The factors that will be considered when evaluating papers include:

  • All submissions:
    • Utility and quality of the submission: Impact, originality, novelty, relevance to the NeurIPS community will all be considered. 
    • Reproducibility: All submissions should be accompanied by sufficient information to reproduce the results described i.e. all necessary datasets, code, and evaluation procedures must be accessible and documented. We encourage the use of a reproducibility framework such as the ML reproducibility checklist to guarantee that all results can be easily reproduced. Benchmark submissions in particular should take care to ensure sufficient details are provided to ensure reproducibility. If submissions include code, please refer to the NeurIPS code submission guidelines.  
    • Was code provided (e.g. in the supplementary material)? If provided, did you look at the code? Did you consider it useful in guiding your review? If not provided, did you wish code had been available?
    • Ethics: Any ethical implications of the work should be addressed. Authors should rely on NeurIPS ethics guidelines as guidance for understanding ethical concerns.  
  • Dataset submissions:
    • Completeness of the relevant documentation: Per NeurIPS ethics guidelines, datasets must be accompanied by documentation communicating the details of the dataset as part of their submissions via structured templates (e.g. TODO). Sufficient detail must be provided on how the data was collected and organized, what kind of information it contains,  ethically and responsibly, and how it will be made available and maintained. 
    • Licensing and access: Per NeurIPS ethics guidelines, authors should provide licenses for any datasets released. These should consider the intended use and limitations of the dataset, and develop licenses and terms of use to prevent misuse or inappropriate use.  
    • Consent and privacy: Per  NeurIPS ethics guidelines, datasets should minimize the exposure of any personally identifiable information, unless informed consent from those individuals is provided to do so. Any paper that chooses to create a dataset with real data of real people should ask for the explicit consent of participants, or explain why they were unable to do so.
    • Ethics and responsible use: Any ethical implications of new datasets should be addressed and guidelines for responsible use should be provided where appropriate. Note that, if your submission includes publicly available datasets (e.g. as part of a larger benchmark), you should also check these datasets for ethical issues. You remain responsible for the ethical implications of including existing datasets or other data sources in your work.
    • Legal compliance: For datasets, authors should ensure awareness and compliance with regional legal requirements.

 

ADVISORY COMMITTEE

The following committee will provide advice on the organization of the track over the coming years: Sergio Escalera, Isabelle Guyon, Neil Lawrence, Dina Machuve, Olga Russakovsky, Joaquin Vanschoren, Serena Yeung.

 

DATASETS AND BENCHMARKS CHAIRS

Lora Aroyo, Google
Francesco Locatello, Institute of Science and Technology Austria
Lingjuan Lyu, Sony AI

 

Contact: datasetsbenchmarks@neurips.cc