NeurIPS 2023 Competitions

Causal Structure Learning from Event Sequences and Prior Knowledge

Competition

Zhang Keli · Ruichu Cai · Kun Kuang · Lujia Pan · Ye Jian · Jiale Zheng · Mengyue Yang · Marcus Kalander · Dai Quanyu · Liu Yuequn

[ Virtual ]

Abstract

In this competition, we are focusing on a fundamental causal challenge: participantsare asked to learn the causal alarm graphs in which every node is an alarm typefrom observable historical alarm data together with limited prior knowledge. Thechallenge originates from a real-world root cause analysis (RCA) scenario intelecommunication networks. By addressing this challenge, participants will notonly help operators trouble-shooting efficiently, but also advance the field of causaldiscovery, and contribute to our understanding of complex systems.

Melting Pot Contest

Competition

Rakshit Trivedi · Akbir Khan · Jesse Clifton · Lewis Hammond · John Agapiou · Edgar Dueñez-Guzman · Jayd Matyas · Dylan Hadfield-Menell · Joel Leibo

[ Room 357 ]

Abstract

Multi-agent AI research promises a path to develop human-like and human-compatible intelligent technologies that complement the solipsistic view of other approaches, which mostly do not consider interactions between agents. We propose a Cooperative AI contest based on the Melting Pot framework. At its core, Melting Pot provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios. There exist several benchmarks, challenges, and contests aimed at spurring research on cooperation in multi-agent learning. Melting Pot expands and generalizes these previous efforts in several ways: (1) it focuses on mixed-motive games, (as opposed to purely cooperative or competitive games); (2) it enables testing generalizability of agent cooperation to previously unseen coplayers; (3) it consists of a suite of multiple environments rather than a single one; and (4) it includes games with larger numbers of players (> 7). These properties make it an accessible while also challenging framework for multi-agent AI research. For this contest, we invite multi-agent reinforcement learning solutions that focus on driving cooperation between interacting agents in the Melting Pot environments and generalize to new situations beyond training. A scoring mechanism based on metrics representative of cooperative intelligence will be used to …

Foundation Model Prompting for Medical Image Classification Challenge 2023

Competition

Dequan Wang · Xiaosong Wang · Mengzhang Li · Qian Da · DOU QI · · Shaoting Zhang · Dimitris Metaxas

[ Virtual ]

Abstract

The lack of public availability and quality annotations in medical image data has been the bottleneck for training large-scale deep learning models for many clinical downstream applications. It remains a tedious and time-consuming job for medical professionals to hand-label volumetric data repeatedly while providing a few differentiable sample cases is more logically feasible and complies with the training process of medical residents. The proposed challenge aims to advance technique in prompting large-scale pre-trained foundation models via a few data samples as a new paradigm for medical image analysis, e.g., classification tasks proposed here as use cases. It aligns with the recent trend and success of building foundation models (e.g., Vision Transformers, GPT-X, and CLIP) for a variety of downstream applications. Three private datasets for different classification tasks, i.e., thoracic disease classification, pathological tumor tissue classification, and colonoscopy lesion classification, are composed as the training (few samples) and validation sets (the rest of each dataset). Participants are encouraged to advance cross-domain knowledge transfer techniques in such a setting and achieve higher performance scores in all three tasks. The final evaluation will be conducted in the same tasks on the reserved private datasets.

Weather4cast 2023 – Data Fusion for Quantitative Hi-Res Rain Movie Prediction under Spatio-temporal Shifts

Competition

Aleksandra Gruca · Pilar Rípodas · Xavier Calbet · Llorenç Lliso Valverde · Federico Serva · Bertrand Le Saux · Michael Kopp · David Kreil · Sepp Hochreiter

[ Virtual ]

Abstract

The competition will advance modern algorithms in AI and machine learning through a highly topical interdisciplinary competition challenge: The prediction of hi-res rain radar movies from multi-band satellite sensors requires data fusion of complementary signal sources, multi-channel video frame prediction, as well as super-resolution techniques. To reward models that extract relevant mechanistic patterns reflecting the underlying complex weather systems our evaluation incorporates spatio-temporal shifts: Specifically, algorithms need to forecast 8h of ground-based hi-res precipitation radar from lo-res satellite spectral images in a unique cross-sensor prediction challenge. Models are evaluated within and across regions on Earth with diverse climate and different distributions of heavy precipitation events. Conversely, robustness over time is achieved by testing predictions on data one year after the training period.Now, in its third edition, weather4cast 2023 moves to improve rain forecasts world-wide on an expansive data set and novel quantitative prediction challenges. Accurate rain predictions are becoming ever more critical for everyone, with climate change increasing the frequency of extreme precipitation events. Notably, the new models and insights will have a particular impact for the many regions on Earth where costly weather radar data are not available.

The Robot Air Hockey Challenge: Robust, Reliable, and Safe Learning Techniques for Real-world Robotics

Competition

Puze Liu · Jonas Günster · Niklas Funk · Dong Chen · Haitham Bou Ammar · Davide Tateo · Ziyuan Liu · Jan Peters

[ Room 353 ]

Abstract

While machine learning methods demonstrated impressive success in many application domains, their impact on real robotic platforms is still far from their potential.To unleash the capabilities of machine learning in the field of robotics, researchers need to cope with specific challenges and issues of the real world. While many robotics benchmarks are available for machine learning, most simplify the complexity of classical robotics tasks, for example neglecting highly nonlinear dynamics of the actuators, such as stiction. We organize the robot air hockey challenge, which allows machine learning researchers to face the sim-to-real-gap in a complex and dynamic environment while competing with each other. In particular, the challenge focuses on robust, reliable, and safe learning techniques suitable for real-world robotics. Through this challenge, we wish to investigate how machine learning techniques can outperform standard robotics approaches in challenging robotic scenarios while dealing with safety, limited data usage, and real-time requirements.

Privacy Preserving Federated Learning Document VQA

Competition

Dimosthenis Karatzas · Rubèn Tito · Lei Kang · Mohamed Ali Souibgui · Khanh Nguyen · Raouf Kerkouche · Kangsoo Jung · Marlon Tobaben · Joonas Jälkö · Vincent Poulain d'Andecy · Aurélie JOSEPH · Ernest Valveny · Josep Llados · Antti Honkela · Mario Fritz

[ Room 356 ]

Abstract

In an era of increasing digitalization and data-driven decision-making, the intersection of document intelligence and privacy has become a critical concern. The Privacy-Preserving Federated Learning Document Visual Question Answering Workshop aims to bring together experts, researchers, and practitioners to explore innovative solutions and discuss the latest advancements in this crucial field.

Join us for insightful invited talks by leading figures in the field. These talks will provide valuable perspectives on the current state of privacy-preserving document intelligence and its future directions. Get an in-depth look at the Privacy-Preserving Document Visual Question Answering Competition that we are currently holding, with a detailed overview of the competition, the dataset, and the competition results. Moreover, the top winners of the competition will have the opportunity to give short talks about their winning methods and strategies. Gain firsthand insights into the innovative approaches that led to their success.

Workshop URL: https://sites.google.com/view/pfldocvqa-neurips-23/home
Associated Competition URL: https://benchmarks.elsa-ai.eu/?ch=2

NeurIPS 2023 Machine Unlearning Competition

Competition

Eleni Triantafillou · Fabian Pedregosa · Meghdad Kurmanji · Kairan ZHAO · Gintare Karolina Dziugaite · Peter Triantafillou · Ioannis Mitliagkas · Vincent Dumoulin · Lisheng Sun · Peter Kairouz · Julio C Jacques Junior · Jun Wan · Sergio Escalera · Isabelle Guyon

[ Room 355 ]

Abstract

We are proposing the first competition on machine unlearning, to our knowledge. Unlearning is a rapidly growing area of research that has emerged in response to one of the most significant challenges in deep learning: allowing users to exercise their right to be forgotten. This is particularly challenging in the context of deep models, which tend to memorize information from their training data, thus compromising privacy. The lack of a standardized evaluation protocol has hindered the development of unlearning, which is a relatively new area of research. Our challenge is designed to fill this need. By incentivizing the development of better unlearning algorithms, informing the community of their relative strengths and weaknesses, and unifying evaluation criteria, we expect our competition to have a significant impact. We propose a realistic scenario for unlearning face images.

The NeurIPS 2023 Neural MMO Challenge: Multi-Task Reinforcement Learning and Curriculum Generation

Competition

Joseph Suarez · Phillip Isola · David Bloomin · Kyoung Whan Choe · Hao Li · Ryan Sullivan · Nishaanth Kanna · Daniel Scott · Rose Shuman · Herbie Bradley · Louis Castricato · Chenghui Yu · Yuhao Jiang · Qimai Li · Jiaxin Chen · Xiaolong Zhu · Dipam Chakrabroty · Sharada Mohanty · Nikhil Pinnaparaju

[ Room 354 ]

Abstract

In this competition, participants train agents to complete a variety of tasks in Neural MMO 2.0 including foraging, combat, tool acquisition and usage, and item trading. Neural MMO is a simulated environment featuring 128 players, procedurally generated maps, and emergent complexity from interactions among agents. The competition features three tracks: two compute-limited academic tracks focused on multi-agent reinforcement learning and curriculum generation and one unrestricted track. This is the fourth challenge on Neural MMO, and the previous competitions have all yielded state-of-the-art performance on earlier versions of this environment as well as more general improvements to learning methods. The NeurIPS workshop will include presentations by the developers and by researchers in reinforcement learning and open-endedness.

The CityLearn Challenge 2023

Competition

Zoltan Nagy · Kingsley Nweye · Sharada Mohanty · Ruchi Choudhary · Max Langtry · Gregor Henze · Jan Drgona · Sourav Dey · Alfonso Capozzoli · Mohamed Ouf

[ Virtual ]

Abstract

Reinforcement learning has gained popularity as a model-free and adaptive controller for the built-environment in demand-response applications. However, a lack of standardization on previous research has made it difficult to compare different RL algorithms with each other. Also, it is unclear how much effort is required in solving each specific problem in the building domain and how well a trained RL agent will scale up to new environments. The CityLearn Challenge 2023 provides an avenue to address these problems by leveraging CityLearn, an OpenAI Gym Environment for the implementation of RL agents for demand response. The challenge utilizes a novel dataset based on the US end use load profile database. Participants are to develop energy management agents for battery charge and discharge control in each building with a goal of minimizing electricity demand from the grid, electricity bill and greenhouse gas emissions. We provide a baseline RBC agent for the evaluation of the RL agents performance and rank the participants' according to their solution's ability to outperform the baseline.

NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day

Competition

Mark Saroufim · Weiwei Yang · Christian Puhrsch · Luca Antiga · Greg Bowyer · Driss Guessous · Artidoro Pagnoni · Supriya Rao · Joseph Isaacson · Vicki Boykis · Geeta Chauhan · aaron gonzales · Davide Eynard

[ Room 356 ]

Abstract

Large Language Models (LLMs) have been pivotal in the recent Cambrian explosion of generative AI applications. However, existing efforts to democratize access to fine-tune and query LLMs have been largely limited by growing hardware costs required to adapt and serve these models. Enabling low cost and efficient LLM fine-tuning and inference can have significant impact on industrial and scientific applications. Here, we present a single GPU fine-tuning and inference competition. Our goal is to accelerate the development of practical software methods to reduce the costs associated with utilizing LLMs. Furthermore, by advocating for goal-oriented and infrastructure-focused evaluation frameworks that stress reproducibility, our aim is to democratize access to these methods and enhance their accessibility to the wider public.

The HomeRobot Open Vocabulary Mobile Manipulation Challenge

Competition

Sriram Yenamandra · Arun Ramachandran · Mukul Khanna · Karmesh Yadav · Devendra Singh Chaplot · Gunjan Chhablani · Alexander Clegg · Theophile Gervet · Vidhi Jain · Ruslan Partsey · Ram Ramrakhya · Andrew Szot · Austin Wang · Tsung-Yen Yang · Aaron Edsinger · Charles Kemp · Binit Shah · Zsolt Kira · Dhruv Batra · Roozbeh Mottaghi · Yonatan Bisk · Chris Paxton

[ Room 354 ]

Abstract

Deploying robots in real human environments requires a full hardware and software stack, that includes everything from perception to manipulation, in simulation and on accessible physical hardware. The lack of a single unified resource providing these capabilities means that the academic literature often focuses on creating agents in simulation or on one-off hardware, preventing comprehensive benchmarking and reproducibility. We present the first Open-Vocabulary Mobile Manipulation challenge with diverse assets and environments in simulation and a different, held-out set of physical objects in a novel real-world environment. We provide an entire robotics software stack that is modular, fully open-source, and centered on a popular low-cost hardware platform for easy replication and extension by the research community. Machine learning has benefited greatly from the standardization of high-quality engineering; our work aims to lower the cost of entry to robotics.We have assembled a team that spans Georgia Tech, Carnegie Mellon, Meta and Hello Robot to enable physical robot evaluations at NeurIPS. A simulator is ready for deployment and physical kitchens have been constructed for use as a real-world test set. We present development environments here and a peek at the construction of our full held-out test apartment being constructed in Fremont, California. Crucially, …

Lux AI Challenge Season 2 NeurIPS Edition

Competition

Stone Tao · Qimai Li · Yuhao Jiang · JIAXIN CHEN · Xiaolong Zhu · Bovard Doerschuk-Tiberi · Isabelle Pan · Addison Howard

[ Room 355 ]

Abstract

The proposed challenge is a large-scale multi-agent environment with novel complex dynamics, featuring long-horizon planning, perfect information, and more. The challenge uniquely presents an opportunity to investigate problems at a large-scale in two forms, large-scale RL training via GPU optimized environments powered by Jax, as well as large populations of controllable units in the environments. The Lux AI Challenge Season 2 NeurIPS Edition presents a benchmark to test the scaling capabilities of solutions such as RL on environement settings of increasing scale and complexity. Participants can easily get started using any number of strong rule-based, RL, and/or imitation learning (IL) baselines. They are also given access to more than a billion frames of "play" data from the previous iteration of the competition on the small scale version of the environment previously hosted on Kaggle. Participants can submit their agents to compete against other submitted agents on a online leaderboard ranked by a Trueskill ranking system.

The Dynamic Sensorium competition for predicting large-scale mouse visual cortex activity from videos

Competition

Polina Turishcheva · Paul Fahey · Rachel Froebe · Mohammad Bashiri · Konstantin Willeke · Fabian Sinz · Andreas Tolias · Alexander Ecker

[ Room 357 ]

Abstract

Understanding how biological visual systems process information is challenging due to the complex nonlinear relationship between neuronal responses and high- dimensional visual input. Artificial neural networks have already improved our understanding of this system by allowing computational neuroscientists to create predictive models and bridge biological and machine vision. During the Sensorium 2022 competition, we introduced benchmarks for vision models with static input (i.e. images). However, animals operate and excel in dynamic environments, making it crucial to study and understand how the brain functions under these conditions. Moreover, many biological theories, such as predictive coding, suggest that previous input is crucial for current input processing. Currently, there is no standardized benchmark to identify state-of-the-art dynamic models of the mouse visual system. To address this gap, we propose the Sensorium 2023 Benchmark Competition with dynamic input. This competition includes the collection of a new large-scale dataset from the primary visual cortex of five mice, containing responses from over 38,000 neurons to over 2 hours of dynamic stimuli per neuron. Participants in the main benchmark track will compete to identify the best predictive models of neuronal responses for dynamic input (i.e. video). We will also host a bonus track in which submission performance …

ROAD-R 2023: the Road Event Detection with Requirements Challenge

Competition

Eleonora Giunchiglia · Mihaela C. Stoian · Salman Khan · Reza Javanmard alitappeh · Izzeddin A M Teeti · Adrian Paschke · Fabio Cuzzolin · Thomas Lukasiewicz

[ Room 353 ]

Abstract

In recent years, there has been an increasing interest in exploiting readily available background knowledge in order to obtain neural models (i) able to learn from less data, and/or (ii) guaranteed to be compliant with the background knowledge corresponding to requirements about the model. In this challenge, we focus on the autonomous driving domain, and we provide our participants with the recently proposed ROAD-R dataset, which consists of 22 long videos annotated with road events together with a set of requirements expressing well known facts about the world (e.g., “a traffic light cannot be red and green at the same time”). The participants will face two challenging tasks. In the first, they will have to develop the best performing model with only a subset of the annotated data, which in turn will encourage them to exploit the requirements to facilitate training on the unlabelled portion of the dataset. In the second, we ask them to create systems whose predictions are compliant with the requirements. This is the first competition addressing the open questions: (i) If limited annotated data is available, is background knowledge useful to obtain good performance? If so, how can it be injected in deep learning models? And, …

Practical Vector Search (Big ANN) Challenge 2023

Competition

Harsha Vardhan Simhadri · Martin Aumüller · Dmitry Baranchuk · Matthijs Douze · Edo Liberty · Amir Ingber · Frank Liu · George Williams

[ Room 356 ]

Abstract

We propose a competition to encourage the development of indexing data structures and search algorithms for the Approximate Nearest Neighbor (ANN) or Vector search problem in real-world scenarios. Rather than evaluating the classical uniform indexing of dense vectors, this competition proposes to focus on difficult variants of the task. Optimizing these variants is increasingly relevant as vector search becomes commonplace and the "simple" case is sufficiently well addressed. Specifically, we propose the sparse, filtered, out-of-distribution and streaming variants of ANNS.These variants require adapted search algorithms and strategies with different tradeoffs. This competition aims at being accessible to participants with modest compute resources by limiting the scale of the datasets, normalizing on limited evaluation hardware, and accepting open-source submissions to only a subset of the datasets.This competition will build on the evaluation framework https://github.com/harsha-simhadri/big-ann-benchmarksthat we set up for the billion-scale ANNS challenge https://big-ann-benchmarks.com of NeurIPS 2021.

MyoChallenge 2023: Towards Human-Level Dexterity and Agility

Competition

Vittorio Caggiano · · Guillaume Durandau · Seungmoon Song · Cameron Berg · Pierre Schumacher · Chun Kwang Tan · Massimo Sartori · Vikash Kumar

[ Room 354 ]

Abstract

Humans effortlessly grasp objects of diverse shapes and properties and execute agile locomotion without overwhelming their cognitive capacities. This ability was acquired through millions of years of evolution, which honed the symbiotic relationship between the central and peripheral nervous systems and the musculoskeletal structure. Consequently, it is not surprising that uncovering the intricacies of these complex, evolved systems underlying human movement remains a formidable challenge. Advancements in neuromechanical simulations and data driven methods offer promising avenues to overcome these obstacles. To this end, we propose to organize \longName, where we will provide a highly detailed neuromechanical simulation environment and invite experts to develop any type of controller, including state-of-the-art reinforcement learning. Building on the success of NeurIPS 2022: MyoChallenge, which focused on manipulating single objects with a highly articulated musculoskeletal hand, this year's competition will feature two tracks: the manipulation track and the locomotion track. The manipulation track will utilize a substantially extended musculoskeletal model of the hand with added elbow and shoulder, MyoArm, which has 27 DOFs controlled by 63 muscles, and aims to realize generalizable manipulation for unseen objects. The new locomotion track will feature the newly developed MyoLeg, which represents the full body with articulated legs featuring …

NeurIPS 2023 Competition Proposal: Open Catalyst Challenge

Competition

Brandon Wood · Brook Wander · Jehad Abed · John Kitchin · Joseph Musielewicz · Ammar Rizvi · · Zachary Ulissi

[ Hall C2 (level 1) ]

Abstract

The Open Catalyst Challenge is aimed at encouraging the community to make progress on this consequential problem of catalyst materials discovery. An important proxy for catalyst performance is the adsorption energy, i.e. how strongly the adsorbate molecule binds to the catalyst’s surface. This year’s challenge will consist of one primary task – find the adsorption energy (global minimum) given an adsorbate and a catalyst surface. Adsorption energies can be used for screening catalysts and as a result this task will directly support the acceleration of computational discovery of novel catalysts for sustainable energy applications.

This year's results presentation will be part of the AI for Science Workshop held on December 16th.

Train Offline, Test Online: A Democratized Robotics Benchmark

Competition

Victoria Dean · Gaoyue Zhou · Mohan Kumar Srirama · Sudeep Dasari · Esther Brown · Marion Lepert · Aleksandra Faust · Chelsea Finn · Lerrel Pinto · Abhinav Gupta

[ Virtual ]

Abstract

The Train Offline, Test Online (TOTO) competition provides a shared, remote robot setup paired with an open-source dataset. Participants can train offline agents (e.g. via behavior cloning or offline reinforcement learning) and evaluate them on two common manipulation tasks (pouring and scooping), which require challenging generalization across objects, locations, and lighting conditions. TOTO has an additional track for evaluating vision representations, which are combined with a standard behavior cloning method for evaluation. The competition begins with a simulation phase to qualify for the real-robot phase. We hope that TOTO will recruit newcomers to robotics by giving them a chance to compete and win on real hardware and the resources needed to get started.

Single-cell perturbation prediction: generalizing experimental interventions to unseen contexts

Competition

Daniel Burkhardt · Andrew Benz · Robrecht Cannoodt · Mauricio Cortes · Scott Gigante · Christopher Lance · Richard Lieberman · Malte Luecken · Angela Pisco

[ Room 354 ]

Abstract

Single-cell sequencing technologies have revolutionized our understanding of the heterogeneity and dynamics of cells and tissues. However, single-cell data analysis faces challenges such as high dimensionality, sparsity, noise, and limited ground truth. In this 3rd installment in the Open Problems in Single-Cell Analysis competitions at NeurIPS, we challenge competitors to develop algorithms capable of predicting single-cell perturbation response across experimental conditions and cell types. We will provide a new benchmark dataset of human peripheral blood cells under chemical perturbations, which simulate drug discovery experiments. The objective is to develop methods that can generalize to unseen perturbations and cell types to enable scientists to overcome the practical and economic limitations of single-cell perturbation studies. The goal of this competition is to leverage advances in representation learning (in particular, self-supervised, multi-view, and transfer learning) to unlock new capabilities bridging data science, machine learning, and computational biology. We hope this effort will continue to foster collaboration between the computational biology and machine learning communities to advance the development of algorithms for biomedical data.

TDC 2023 (LLM Edition): The Trojan Detection Challenge

Competition

Mantas Mazeika · Andy Zou · Norman Mu · Long Phan · Zifan Wang · Chunru Yu · Adam Khoja · Fengqing Jiang · Aidan O'Gara · Zhen Xiang · Arezoo Rajabi · Dan Hendrycks · Radha Poovendran · Bo Li · David Forsyth

[ Room 357 ]

Abstract

The Trojan Detection Challenge (LLM Edition) aims to advance the understanding and development of methods for detecting hidden functionality in large language models (LLMs). The competition features two main tracks: the Trojan Detection Track and the Red Teaming Track. In the Trojan Detection Track, participants are given a large language model containing thousands of trojans and tasked with discovering the triggers for these trojans. In the Red Teaming Track, participants are challenged to elicit specific undesirable behaviors from a large language model fine-tuned to avoid those behaviors. TDC 2023 will include Base Model and Large Model subtracks to enable broader participation, and established trojan detection and red teaming baselines will be provided as a starting point. By uniting trojan detection and red teaming, TDC 2023 aims to foster collaboration between these communities to promote research on hidden functionality in LLMs and enhance the robustness and security of AI systems.