2023 Ethics Guidelines for Reviewers

Introduction

New for 2023: NeurIPS Code of Ethics

Please make sure all papers conform to this minimal standard.

As AI/ML research and applications have increasing real-world impact, both the likelihood of meaningful social benefit and the attendant risk of harm increase. Indeed, recent years have seen both the application of ML to socially beneficial issues like fighting climate change [e.g. 1] and an increase of problems linked to data privacy, algorithmic bias, automation risk, and potential malicious uses of AI [e.g., 2].

In the light of this, ML researchers can no longer “simply assume that... research will have a net positive impact on the world” [3]. The research community should consider not only the potential benefits but also the potential negative societal impacts of ML research and adopt corresponding measures that enable positive trajectories to unfold while mitigating risk of harm.

Both authors and reviewers (including technical reviewers and ethics reviewers) should use this document for clarity and shared understanding about the NeurIPS ethics principles in the NeurIPS Code of Ethics.

The primary goal of the NeurIPS ethics review is to provide critical feedback for the authors to incorporate into the paper. In rare situations, however, NeurIPS reserves the right to reject submissions that have grossly violated the ethical principles stated in this document. Such decisions are made exclusively by the Area Chairs, under recommendations from the ethics reviewers. However, the Area Chairs are the ultimate decision-makers for all submission acceptance and rejections.

There are several aspects of ethics to consider: general ethical conduct (Section 2), potential negative societal impacts (Section 3), general guidance framework (Section 4), and additional considerations to keep in mind (Section 5).

General Ethical Conduct

Submissions must adhere to ethical standards for responsible research practice and due diligence in the conduct.

Issues having to do with academic misconduct/scientific integrity such as plagiarism are to be expressly directed to the area chairs and program chairs for resolution.

If the research uses human-derived data, consider whether that data might:

Contain any personally identifiable information or sensitive personally identifiable information. For instance, does the dataset use features or label information about individual names? Did people provide their consent on the collection of such data? Could the use of the data be degrading or embarrassing for some people?
Contain information that could be deduced about individuals that they have not consented to share. For instance, a dataset for recommender systems could inadvertently disclose user information such as their name, depending on the features provided.
Encode, contain, or potentially exacerbate bias against people of a certain gender, race, sexuality, or who have other protected characteristics. For instance, does the dataset represent the diversity of the community where the approach is intended to be deployed?
Contain human subject experimentation and whether it has been reviewed and approved by a relevant oversight board. For instance, studies predicting characteristics (e.g., health status) from human data (e.g., contacts with people infected by COVID-19) are expected to have their studies reviewed by an ethical board.
Have been discredited by the creators. For instance, the DukeMTMC-ReID dataset has been taken down, and it should not be used in NeurIPS submissions.

In general, there are other issues related to data that are worthy of consideration and review. These include:

Consent to use or share the data. Explain whether you have asked the data owner’s permission to use or share data and what the outcome was. Even if you did not receive consent, explain why this might be appropriate from an ethical standpoint. For instance, if the data was collected from a public forum, were its users asked to consent to use the data they produced and, if not, why?
Domain specific considerations when working with high-risk groups. For example, if the research involves work with minors or vulnerable adults, have the relevant safeguards been put in place?
Filtering of offensive content. For instance, when collecting a dataset, how are the authors filtering offensive content such as racist language or violent imagery?
Compliance with GDPR and other data-related regulations. For instance, if the authors collect human-derived data, what is the mechanism to guarantee individuals’ right to be forgotten (removed from the dataset)?

This list is not intended to be exhaustive — it is included here as a prompt for author and reviewer reflection.

Potential Negative Societal Impacts

Submissions to NeurIPS are expected to include a discussion about potential negative societal impacts of the proposed research artifact or application (this corresponds to question 1c of the NeurIPS Paper Checklist). Whenever these are identified, submissions should also include a discussion about how these risks can be mitigated.

Grappling with ethics is a difficult problem for the field and thinking about ethics is still relatively new to many authors. Given its controversial nature, we choose to place a strong emphasis on transparency. In certain cases, it will not be possible to draw a bright line between ethical and unethical. A paper should therefore discuss any potential issues, welcoming a broader discussion that engages the whole community.

A common difficulty with assessing ethical impact is its indirectness: most papers focus on general-purpose methodologies (e.g., optimization algorithms), whereas ethical concerns are more apparent when considering deployed applications (e.g., surveillance systems). Also, real-world impact (both positive and negative) often emerges from the cumulative progress of many papers, so it is difficult to attribute the impact to an individual paper.

The ethics consequences of a paper can stem from either the methodology or the application. On the methodology side, for example, a new adversarial attack might give unbalanced power to malicious entities; in this case, defenses and other mitigation strategies would be expected, as is standard in computer security. On the application side, in some cases, the choice of application is incidental to the core contribution of the paper, and a potentially harmful application should be swapped out (as an extreme example, replacing ethnicity classification with bird classification), but the potential mis-uses should be still noted. In other cases, the core contribution might be inseparable from a questionable application (e.g., reconstructing a face given speech). In such cases, one should critically examine whether the scientific merits really outweigh the potential ethical harms.

A non-exhaustive list of potential negative societal impacts is included below. Consider whether the proposed methods and applications can:

Directly facilitate injury to living beings. For example: could it be integrated into weapons or weapons systems?
Raise safety or security concerns. For example: is there a risk that applications could cause serious accidents or open security vulnerabilities when deployed in real-world environments?
Raise human rights concerns. For example: could the technology be used to discriminate, exclude, or otherwise negatively impact people, including impacts on the provision of vital services, such as healthcare and education, or limit access to opportunities like employment? Please consult the Toronto Declaration for further details.
Have a detrimental effect on people’s livelihood or economic security. For example: Have a detrimental effect on people’s autonomy, dignity, or privacy at work or threaten their economic security (e.g., via automation or disrupting an industry)? Could it be used to increase worker surveillance or impose conditions that present a risk to the health and safety of employees?
Develop or extend harmful forms of surveillance. For example: could it be used to collect or analyze bulk surveillance data to predict immigration status or other protected categories or be used in any kind of criminal profiling?
Severely damage the environment. For example: would the application incentivize significant environmental harms such as deforestation, fossil fuel extraction, or pollution?
Deceive people in ways that cause harm. For example: could the approach be used to facilitate deceptive interactions that would cause harms such as theft, fraud, or harassment? Consider possible harms that could arise when the technology is being used as intended but also those that could arise when the technology is being (intentionally or unintentionally) misused.
Can be misused or modified to produce contradictory results. For example, ML-based approaches for drug discovery have been shown to also be applicable towards designing biochemical weapons.

General Guidance Framework

A methodological framework for risk management of AI/ML/DL systems considers whether the proposed applications and methods represent:

Responsible AI: exercise appropriate levels of judgment and care, while remaining responsible for the development, deployment, and use of AI capabilities.
Equitable AI: taking deliberate steps to minimize unintended bias in AI capabilities
Traceable AI: capabilities will be developed and deployed such that relevant personnel possess an appropriate understanding of the technology, development processes, and operational methods applicable to AI capabilities, including with transparent and auditable methodologies, data sources, and design procedure and documentation.
Reliable AI: capabilities will have explicit, well-defined uses, and the safety, security, and effectiveness of such capabilities will be subject to testing and assurance within those defined uses across their entire life-cycles.
Governable AI: design and engineer AI capabilities to fulfill their intended functions while possessing the ability to detect and avoid unintended consequences and the ability to disengage or deactivate deployed systems that demonstrate unintended behavior.
Trustworthy AI: appropriately reflects characteristics such as accuracy, explainability and interpretability, privacy, reliability, robustness, safety, and security or resilience to attacks and ensures that bias is mitigated.

Additional Considerations To Keep in Mind

NeurIPS is a global conference with global scope and global authors, but there are no global ethical principles or laws or global standards over the Internet. The NeurIPS Code of Ethics is a baseline from which to promote the values of the conference.

Privacy considerations, local laws, regulations, and legal authority related to protecting personal information, rights, and interests, personal information handling activities, and the rational use of personal information will likely differ between authors from different parts of the world.

Terms-of-Service may differ for data access, retention, and use.
The authors may not have access to Institutional Review Boards or other mechanisms for applications involving human data or human subject research.

There are many unresolved legal concerns globally related to the impact of generative AI and how the technology impacts the laws and regulations of different countries. What may be legal violations in one country may be legal in another.

Reviews are blinded so consider what the authors have disclosed, either directly or in supplemental material, datasheets, or in Conference-required checklist form about their application and methodologies. Not everyone is going to come from the same ethics background or be required by their country-of-origin to follow the laws and procedures of a different country-of-origin.

Contributors should indicate whether a formal or informal process was used. In cases using an informal process, contributors should explain the steps taken to insure the protection of human participants and specify the type of human subject oversight provided to the project.

When applicable, have the authors indicated whether they have established restrictions for the dataset use? In datasets containing sensitive or biometric data, distribution should be more controlled to protect against potential misuse. The authors should consider the use of a license. At a minimum, authors should be explicit about how they expect the data to be used and to document their intent regarding the dataset.

Copyright is a statutory law and fair use is a legal principle that may or may not be codified in some jurisdictions. Copyright law is determined by individual nation-states whose laws vary greatly along a continuum. The only international "norm" is the Berne Convention (181 signatories as of 2022), but it does not necessarily comply with all nation-state laws on copyright.

If there is a strong concern over copyright, have the authors indicated or confirmed that they are in compliance with the copyright laws and regulations of their country-of-origin?

Additional information may be required from authors where legal compliance could not be met due to human rights violations (freedom of expression, right to work and education, bodily autonomy, etc.).

Under the principle of Governable AI, the goal is to design and engineer capabilities to fulfill their intended function while possessing the ability to detect and avoid unintended consequences and the ability to disengage or deactivate deployed systems that demonstrate unintended behavior. Does it appear that the authors have considered whether their approach or research promotes or obfuscates deception or produces unintended consequences rendering the application untrustworthy?

When considering potential for "misuse" of an asserted application or methods, one may wish to consider whether that potential has been communicated by the authors, directly or in supplemental form, and whether there is an ability to control or restrict the distribution of, to disengage, or to deactivate the systems that demonstrate unintended behavior.

A large part of the goal of ethics review is education and disclosure. Ethics should guide, not limit scientific discovery and advancement.

Final Remarks

In summary, we expect NeurIPS submissions to include discussion about potential harms, malicious use, and other potential ethical concerns arising from the use of the proposed approach or application.

We also expect authors to include a discussion about methods to mitigate such risks. Moreover, authors should adhere to best practices in their handling of data.

Whenever there are risks associated with the proposed methods, methodology, application or data collection and data usage, authors are expected to elaborate on the rationale of their decision and potential mitigations.

Submissions will be evaluated also in terms of the depth of such ethical reflections.

References

[1] D. Rolnick, P.L. Donti, L. H. Kaack, K. Kochanski, A. Lacoste, ... & Y. Bengio (2022). Tackling climate change with machine learning. ACM Computing Surveys (CSUR), 55(2), 1-96.

[2] J. Whittlestone, R. Nyrup, A. Alexandrova, K. Dihal, and S. Cave. (2019) Ethical and societal implications of algorithms, data, and artificial intelligence: a roadmap for research. London: Nuffield Foundation.

[3] B. Hecht, L. Wilcox, J. P. Bigham, J. Schoning, E. Hoque, J. Ernst, Y. Bisk, L. De Russis, L. Yarosh, B. Anjam, D. Contractor, and C. Wu. (2018) It’s Time to Do Something: Mitigating the Negative Impacts of Computing Through a Change to the Peer Review Process. ACM Future of Computing Blog.

The ethical review guidelines have evolved from 2021-2023 with contributions by: Cherie Poland, Jiahao Chen, Lester Mackey, Sasha Luccioni, William Isaac, Deborah Raji, Samy Bengio, Kate Crawford, Jeanne Fromer, Iason Gabriel, Amanda Levendowski, and Marc'Aurelio Ranzato, with support and feedback from prior NeurIPS Program Chairs Alina Beygelzimer, Yann Dauphin, Percy Liang, Jenn Wortman Vaughan.