NeurIPS 2026 Evaluations & Datasets FAQ
Updated 7 April 2026
This FAQ will be continually updated. Please bookmark this page and review it before submitting any questions.
Note: Authors are also advised to consult the NeurIPS Main Track handbook, as general policies apply to ED submissions as well.
General FAQs
Will accepted papers in the Evaluations & Datasets Track appear in exactly the same proceedings as the main track papers?
Yes, accepted papers will be published in the NeurIPS proceedings and presented at the conference alongside main track papers.
What is the LaTeX template for the ED track?
It’s the same as the main track template. Check “Paper Formatting Instructions” at NeurIPS Main Track handbook.
Are there guidelines for submissions which are from the 2024/2025 Competitions track, e.g., reporting on competition results?
No, there are no special guidelines. Please follow the ED CFP and data hosting guidelines. Your submission will be reviewed according to the same standards alongside all other ED track submissions. We suggest you review the revised scope of the E&D track carefully when framing your work.
Are dataset/code submissions due on May 6 (the full paper deadline)?
Yes. We follow the Main Track timeline, so the full paper — including all required materials — must be submitted by May 6, 2026 (AOE). For the ED track, datasets and code are not considered supplementary materials. If your submission includes data and/or code, they must be submitted in their final form by May 6, 2026 (AOE), together with the full paper.
What is the LaTeX configuration for a single-blind submission?
Please use \usepackage[eandd, nonanonymous]{neurips_2026} if you wish to make your submission single-blind for the ED track.
When reading CfPs, consider the main contribution of your work to make a decision. For instance, suppose your paper’s main contribution is to highlight that a certain architecture is more performant, with secondary contributions in terms of new evaluations. In this case, you should use the novel architecture as the primary contribution.
If a case seems truly ambiguous, authors should select their track based on how they would like their paper to be evaluated (see CfPs). The framing of the paper should match that of the track. For example, consider:
A submission evaluates LLMs for legal tasks such as statutory interpretation, case outcome prediction, and contract analysis using new stress tests and expert lawyer assessments. If the primary contribution is the evaluation framework and empirical insights into LLM reliability in law, the authors may select ED. If these findings instead motivate and validate a novel law-aware LLM adaptation that advances legal reasoning performance, the authors may select main track/use inspired.The boundary shifts when evaluation is the core intellectual contribution versus when it serves as evidence for a domain-driven modeling advance
Please note that there will be no possibility to switch tracks or types and that papers cannot be submitted to multiple tracks or types simultaneously. Irrelevant or duplicate papers risk desk rejection from all tracks. It is the authors’ responsibility to carefully read the relevant CfPs and identify the most appropriate track.
We also provide the following examples* – some straightforward, some more ambiguous – to provide authors with more guidance on the different tracks:
-
ImageNet: A large-scale hierarchical image database (Deng et al., 2009): A dataset for computer vision applications with a demonstration of its value in three tasks. The primary contribution relates to the dataset → ED.
-
Inherent Trade-Offs in the Fair Determination of Risk Scores (Kleinberg et al., 2016). This paper investigates the impossibility of satisfying multiple fairness criteria simultaneously. While it relies heavily on a theoretical framing, its main contribution is that of a negative, rigorously demonstrated and surprising result that is not demonstrated via empirical evaluations → main track / negative result.
-
Learning skillful medium-range global weather forecasting (Lam et al., 2023): Using graph neural networks for 10 day weather predictions. This is a novel application of GNNs to a specific domain that includes domain-specific metrics (e.g. prediction of extreme events). The work is neither methodology or evaluation focused (while including both aspects) and clearly advances a real-world use case → main track / use-case inspired.
-
The Illusion of Readiness in Health AI (Gu et al., 2025): A focus on evaluations for a use-case inspired application in healthcare providing negative results through experimentation. While this is a use-case inspired, negative result, the primary focus of this paper is on experimental evaluations → ED.
-
Fairness Through Awareness (Dwork et al., 2011): the definition and implementation of individual fairness, with a secondary contribution that provides an algorithm to improve on this metric. The paper relies heavily on a theoretical framework. While the authors could consider different options (main/general due to algorithm development or main/theory), we believe the ED track is most appropriate given that the main contribution is about the definition of a new fairness metric. → ED.
*Please note this is not an endorsement or assessment of the quality of the paper’s contribution.
The main contribution of my paper is use-case inspired and includes evaluations. Should I choose the main track (use-inspired) or ED?
If the paper’s primary contribution is to define new methodologies for evaluations of this use case or highlight surprising negative results obtained from empirical evaluations, ED would be a suitable track. On the other hand, if evaluations are part of the work but not the primary focus (e.g. a novel method has been defined for the use case and is thoroughly evaluated), the main track might be more suitable. See the examples in the question above.
In all cases, the authors should select their track based on how they would like their paper to be evaluated (see CfPs). The framing of the paper should match that of the track.
Please note that there will be no possibility to switch tracks or types and that papers cannot be submitted to multiple tracks or types simultaneously. Irrelevant or duplicate papers risk desk rejection from all tracks. It is the authors’ responsibility to carefully read the relevant CfPs and identify the most appropriate track.
My paper highlights a negative result. Is ED a suitable track?
Negative results, as long as they bring new insights and are thoroughly demonstrated via empirical evaluations, are welcome in ED track. A non-exhaustive list includes failure modes of current benchmarks, failure modes of AI systems in deployment and/or with human-computer interactions. If the main contribution is a theoretical demonstration of a negative result (e.g. impossibility theorem or counter-examples), authors can consider the main track / negative result topic instead. Please see (how do I choose) above for guidance on how to choose between tracks for ambiguous cases.
My main contribution is a training dataset. Does it still fit the scope of E&D?
Yes. Training datasets are welcome as long as the work clearly demonstrates their value in improving (downstream) evaluations, e.g., task performance, robustness, fairness, privacy, and alignment. The metric(s) and task(s) the dataset is designed to improve upon should be clearly stated, along with any assumptions and limitations. Submissions that propose a dataset with the “potential” for machine learning or task improvement without this demonstration are not in scope.
How should I include code in my submission?
You will be asked to provide a URL to a hosting platform (e.g., GitHub, Bitbucket). All code should be documented and executable. If your submission is double-blind, you can use an anonymization service or another method to submit your code anonymously.
My submission is a benchmark consisting of an environment for evaluation only/audits an existing benchmark using publicly available data/is a theoretical framework for comparing evaluation designs. Do I need to follow the data-hosting guidelines?
No. If your submission does not introduce new data, you do not need to follow data-hosting guidelines. You do need to follow code-hosting guidelines if your submission includes new code or tools. The dataset-hosting and Croissant requirements apply only to submissions that introduce new datasets.
Dataset hosting FAQs
The Croissant format can’t handle the file type(s) in my dataset submission. What should I do?
You should still submit a Croissant file. You can choose to provide only dataset-level metadata and a description of the resources in the dataset (FileObject and FileSet). You can omit RecordSets in this scenario. The recommended Croissant-compatible data hosting platforms should handle this gracefully for you, but you will need to address it manually if you decide to self-host your dataset.
I have a submission consisting of multiple datasets. How do I submit the Croissant files?
You should submit a Croissant file for every dataset (and check whether they are all valid). You can combine the .json files into a .zip folder and upload that during submission. In the dataset URL, refer to a webpage that documents the collection of datasets as a whole. The URLs for the individual datasets must be included in the Croissant files.
How do we handle our submission which includes a private hold-out set which we wish to keep private and unreleased, e.g., to avoid potential contamination?
You should mention that you have a private hold-out set and describe it in your paper, but the main contribution of your paper should be the publicly released portion of your dataset. The publicly released portion of your dataset needs to conform to the data hosting guidelines. It may also contain a public validation and test set collected using the same protocol as the private one.
My submission includes a synthetic dataset. Does it need to be documented and hosted in the same way?
Yes. All data hosting guidelines apply to synthetic datasets as well.
I don’t want to make my dataset publicly accessible at the time of submission. What are my options?
Both the Harvard Dataverse and Kaggle platforms offer private URL preview link sharing. This means your dataset is accessible only to those who have the special URL, e.g., reviewers. Note that you will be required to make your dataset public by the camera-ready deadline. Failure to do so may result in removal from the conference and proceedings.
Can I make changes to my dataset after I have made my submission to Open Review?
You can make changes until the submission deadline. After the submission deadline, we will perform automated verification checks of your dataset to assist in streamlining and standardizing reviews. If it changes in a way that invalidates the original reviews at any time between the submission deadline and by the camera ready deadline or publication of proceedings, we reserve the right to remove it from the conference or proceedings.
I am experiencing problems with the platform I am using to release my dataset. What should I do?
We have worked with maintainers of the dataset hosting platforms to identify the appropriate contact information for authors to use for support in case of issues or help with workarounds for storage quotas, etc. Find this contact information in the ED data hosting guidelines
I need to require credentialized (AKA gated) access to my dataset
This will be possible on the condition that a credentialization is necessary for the public good (e.g. because of ethically sensitive medical data), and that an established credentialization procedure is in place that is 1) open to a large section of the public; 2) provides rapid response and access to the data; and 3) is guaranteed to be maintained for many years. A good example here is PhysioNet Credentialing, where users must first understand how to handle data with human subjects, yet is open to anyone who has learned and agrees with the rules.
This should be seen as an exceptional measure, and NOT as a way to limit access to data for other reasons (e.g., to shield data behind a Data Transfer Agreement). Misuse would be grounds for desk rejection. During submission, you can indicate that your dataset involves open credentialized access, in which case the necessity, openness, and efficiency of the credentialization process itself will also be checked.
Our dataset requires credentialized access. How do we preserve single-blind review, i.e., ensure the identities of reviewers aren’t shared with authors?
If it’s possible to share a private preview link rather than requiring credentials, you may try that. Or, you can make an account, give it view access to the dataset, and share login details with reviewers. After submission, you can send a private message visible only to reviewers on Open Review.
I have an extremely large dataset. How do I allow reviewers to properly evaluate it?
Please make sure that the full dataset is available at submission time. You can *in addition* provide ways to help reviewers explore your dataset. This could be a notebook that downloads a portion of the data and helps you explore it, or a bespoke solution appropriate for your dataset.
We also generally require large datasets (> 4GB) to provide a smaller data sample (ideally hosted in the same way). If you make a sample, also explain how you created that sample.
Our submission involves using existing public datasets. Do we need to host these in accordance with the data hosting guidelines?
No, but you should make any code used to modify or otherwise use the public datasets, e.g., for a new benchmark that you are submitting, accessible and executable (meaning you will need to provide publicly accessible links to the data sources used). You also should not claim the existing public datasets as part of your submission.
The online app for checking the validity of croissant files runs for a long time and times out.
This can happen when you have a dataset on Hugging Face. The app may be rate-limited, which causes an error and automatic restarts. If this happens, we recommend validating your Croissant file locally. You can click the three dots at the top right of the app to get code to run it locally, or clone the repository and run it in your own HF Space.