[ West Meeting Room 210 ]
Abstract
We are launching the first AI track for the popular Meta Hacker Cup programmingcompetition, designed to assess the capabilities of Generative AI in performingautonomous code generation tasks. We aim to test the limits of AI in complexcoding challenges and measure the performance gap between AI systems andhuman programmers. We will provide access to all Hacker Cup problems since 2011alongside their respective solutions in a multimodal (image and text) format, andutilize the existing Hacker Cup infrastructure for competitor evaluation. Featuringboth "open evaluation, open model" and "open evaluation, closed model" tracks,this competition invites diverse participation from research institutions of variedinterests and resource constraints, including academic labs, AI startups, largetechnology companies, and AI enthusiasts. Our goal is to develop and democratizemeaningful advancements in code automation with the very first open evaluationprocess for competitive AI programmers.
[ West Meeting Room 209 ]
Abstract
Limb loss represents a traumatic and destabilizing event in human life, significantly impacting an individual's quality of life and independence. Advancements in bionic prosthetic limbs offer a remarkable opportunity to regain mobility and functionality. Bionic limb human users (Bionic Humans) are able to learn to use those prosthetic extensions to compensate for their lost limb, and reclaim aspects of their former motor abilities. The movement generalization and environment adaptability skills displayed by humans using bionic extensions are a testament to motor intelligence, a capability yet unmatched by current artificial intelligence agents.To this end, we propose to organize MyoChallenge 2024: Physiological Dexterity and Agility in Bionic Humans, where we will provide a highly detailed neuromechanical and robotic simulation environment and invite experts worldwide to develop any type of controller for both the biological (muscle) and mechanical (bionic), including state-of-the-art reinforcement learning to solve a series of dexterous motor tasks involving human-to-bionic-limb interaction. Building on the success of the MyoChallenge on the NeurIPS 2022 and 2023 editions, this year's challenge will push the boundaries on how symbiotic human-robotic interaction needs to be coordinated to produce agile and dexterous behaviours. This year MyoChallenge will have two tracks: manipulation and locomotion. The manipulation track …
[ West Meeting Room 215, 216 ]
Abstract
We propose a challenge organised in conjunction with the Fair Universe project, a collaborative effort funded by the US Department of Energy and involving the Lawrence Berkeley National Laboratory, Université Paris-Saclay, University of Washington, and ChaLearn. This initiative aims to forge an open AI ecosystem for scientific discovery. The challenge will focus on measuring the physics properties of elementary particles with imperfect simulators due to differences in modelling systematic errors. Additionally, the challenge will leverage a large-compute-scale AI platform for sharing datasets, training models, and hosting machine learning competitions. Our challenge will bring together the physics and machine learn- ing communities to advance our understanding and methodologies in handling systematic (otherwise known as epistemic) uncertainties within AI techniques.
[ West Meeting Room 208 ]
Abstract
The Ariel Data Challenge 2024 tackles one of astronomy's hardest data analysis problems - extracting faint exoplanetary signals from noisy space telescope observations like the upcoming Ariel Mission. A major obstacle are systematic noise sources, such as ``jitter noise" arising from spacecraft vibrations, which corrupts spectroscopic data used to study exoplanet atmospheres. This complex spatio-temporal noise challenges conventional parametric denoising techniques. In this challenge, the jitter time series is simulated based on Ariel's payload design and other noise effects are taken from in-flight data from JWST, in order to provide a realistic representation of the effect.To recover minute signals from the planet's atmosphere, participants must push boundaries of current approaches to denoise this multimodality data across image, time, and spectral domains. This requires novel solutions for non-Gaussian noise, data drifts, uncertainty quantification, and limited ground truth. Success will directly improve the Ariel pipeline design and enable new frontiers in characterising exoplanet atmospheres - a key science priority in the coming decades for understanding planetary formation, evolution, and habitability.
[ West Meeting Room 215, 216 ]
Abstract
Speech enhancement (SE) is the task of improving the quality of the desired speech while suppressing other interference signals.Tremendous progress has been achieved in the past decade in deep learning-based SE approaches.However, existing SE studies are often limited in one or multiple aspects of the following: coverage of SE sub-tasks, diversity and amount of data (especially real-world evaluation data), and diversity of evaluation metrics.As the first step to fill this gap, we establish a novel SE challenge, called URGENT, to promote research towards universal SE.It concentrates on the universality, robustness, and generalizability of SE approaches.In the challenge, we extend the conventionally narrow SE definition to cover different sub-tasks, thus allowing the exploration of the limits of current SE models.We start with four SE sub-tasks, including denoising, dereverberation, bandwidth extension, and declipping.Note that handling the above sub-tasks within a single SE model has been challenging and underexplored in the SE literature due to the distinct data formats in different tasks.As a result, most existing SE approaches are only designed for a specific subtask.To address this issue, we propose a technically novel framework to unify all these sub-tasks in a single model, which is compatible to most existing SE approaches.Several state-of-the-art …
[ West Meeting Room 209 ]
Abstract
The proposed competition revolves around testing the limits of agents (e.g rule-based or Meta RL agents) when it comes to adapting to a game with changing dynamics. We propose a unique 1v1 competition format where both teams face off in a sequence of 5 games. The game mechanics, along with partial observability are designed to ensure that optimal gameplay requires agents to efficiently explore and discover the game dynamics. They ensure that the strongest agents may play "suboptimally" in game 1 to explore, but then win easily in games 2 to 5 by leveraging information gained through game 1 and adapting. This competition provides a GPU parallelized game environment via jax to enable fast training/evaluation on a single GPU, lowering barriers of entry to typically industry-level scales of research. Participants can submit their agents to compete against other submitted agents on a online leaderboard hosted by Kaggle ranked by a Trueskill ranking system. The results of the competition will provide a dataset of top open-sourced rule-based agents as well as many game episodes that can lead to unique analysis (e.g. quantifying emergence/surprise) past competitions cannot usually provide thanks to the number of competitors the Lux AI Challenges often garner.
[ West Meeting Room 210 ]
Abstract
[ West Meeting Room 208 ]
Abstract
The integration of machine learning (ML) techniques for addressing intricate physics problems is increasingly recognized as a promising avenue for expediting simulations. However, assessing ML-derived physical models poses a significant challenge for their adoption within industrial contexts. This competition is designed to promote the development of innovative ML approaches for tackling physical challenges, leveraging our recently introduced unified evaluation framework known as Learning Industrial Physical Simulations (LIPS). Building upon the preliminary edition held from November 2023 to March 20241, this iteration centers on a task fundamental to a well-established physical application: airfoil design simulation, utilizing our proposed AirfRANS dataset. The competition evaluates solutions based on various criteria encompassing ML accuracy, computational efficiency, Out-Of-Distribution performance, and adherence to physical principles. Notably, this competition represents a pioneering effort in exploring ML-driven surrogate methods aimed at optimizing the trade-off between computational efficiency and accuracy in physical simulations. Hosted on the Codabench platform, the competition offers online training and evaluation for all participating solutions.
[ West Meeting Room 210 ]
Abstract
Training high-performing large language models (LLMs) from scratch is a notoriously expensive and difficult task, costing hundreds of millions of dollars in compute alone. These pretrained LLMs, however, can cheaply and easily be adapted to new tasks via fine-tuning, leading to a proliferation of models that suit specific use cases. Recent work has shown that specialized fine-tuned models can be rapidly merged to combine capabilities and generalize to new skills. This raises the question: given a new suite of desired skills and design parameters, is it necessary to fine-tune or train yet another LLM from scratch, or can similar existing models be re-purposed for a new task with the right selection or merging procedure? The LLM Merging challenge aims to spur the development and evaluation of methods for merging and reusing existing models to form stronger new models without needing additional training. Specifically, the competition focuses on merging existing publicly-released expert models from Hugging Face, using only minimal compute and additional parameters. The goal will be to develop merged models that outperform existing models and existing merging baselines. Submissions will be judged based on the average accuracy on a set of held-out multiple-choice evaluation tasks and their efficiency. To make …
[ West Meeting Room 215, 216 ]
Abstract
Building on the success of the Melting Pot contest at NeurIPS 2023, which challenged participants to develop multi-agent reinforcement learning agents capable of cooperation in groups, we are excited to propose a new contest centered on cooperation between language model (LM) agents in intricate, text-mediated environments. Our goal is to advance research on the cooperative intelligence of such LM agents. Of particular interest are the agents capable of using natural language to effectively cooperate with each other in complex environments, even in the face of challenges such as competing interests, differing values, and potential miscommunication. To this end, we will leverage the recently released Concordia framework, an open-source library for defining open-ended environments where LM agents like those of Park et al. (2023) can interact with one another by generating free-form natural text describing what they intend to do or say. Concordia provides a suite of mixed-motive social dilemma scenarios where cooperation is valuable but hard to achieve. The proposed contest will challenge the participants to develop LM agents that exhibit cooperative intelligence in a variety of Concordia scenarios designed to assess multiple distinct skills of cooperation, including promise-keeping, negotiation, reciprocity, reputation, partner choice, compromise, and sanctioning. Participants will be …
[ Virtual Only ]
Abstract
The competition will advance modern algorithms in AI and machine learning through a highly topical interdisciplinary competition challenge: The prediction of hi-res rain radar movies from multi-band satellite sensors requires data fusion of complementary signal sources, multi-channel video frame prediction, as well as super-resolution techniques. To reward models that extract relevant mechanistic patterns reflecting the underlying complex weather systems our evaluation incorporates spatio-temporal shifts: Specifically, algorithms need to forecast several hours of ground-based hi-res precipitation radar from lo-res satellite spectral images in a unique cross-sensor prediction challenge. Models are evaluated within and across regions on Earth with diverse climate and different distributions of heavy precipitation events. Conversely, robustness over time is achieved by testing predictions on data one year after the training period.Now, in its third year, Weather4acst 2024 aims to improve rain forecasts world-wide on an expansive data set with over a magnitude more hi-res rain radar data, allowing a move towards Foundation Models through multi-modality, multi-scale, multi-task challenges. Accurate rain predictions are becoming ever more critical for everyone, with climate change increasing the frequency of extreme precipitation events. Notably, the new models and insights will have a particular impact for the many regions on Earth where costly weather …
[ Virtual Only ]
Abstract
The NeurIPS 2024 LLM Privacy Challenge is designed to address the critical issue of privacy in the use of Large Language Models (LLMs), which have become fundamental in a wide array of artificial intelligence applications. This competition acknowledges the potential privacy risks posed by the extensive datasets used to train these models, including the inadvertent leakage of sensitive information. To mitigate these risks, the challenge is structured around two main tracks: the Red Team, focusing on identifying and exploiting privacy vulnerabilities, and the Blue Team, dedicated to developing defenses against such vulnerabilities. Participants will have the option to work with LLMs fine-tuned on synthetic private data or LLMs interacting with private system/user prompts, thus offering a versatile approach to tackling privacy concerns. The competition will provide participants with access to a toolkit designed to facilitate the development of privacy-enhancing methods, alongside baselines for comparison. Submissions will be evaluated based on attack accuracy, efficiency, and the effectiveness of defensive strategies, with prizes awarded to the most innovative and impactful contributions. By fostering a collaborative environment for exploring privacy-preserving techniques, the NeurIPS 2024 LLM Privacy Challenge aims to catalyze advancements in the secure and ethical deployment of LLMs, ensuring their continued utility …
[ West Meeting Room 215, 216 ]
Abstract
Small molecule drugs are often discovered using a brute force physical search,wherein scientists test for interactions between candidate drugs and their proteintargets in a laboratory setting. As druglike chemical space is large (10^60), moreefficient methods to search through this space are desirable. To enable the discoveryand application of such methods, we generated the Big Encoded Library forChemical Assessment (BELKA), roughly 3.6B physical binding measurementsbetween 133M small molecules and 3 protein targets using DNA-encoded chemicallibrary technology. We hope this dataset encourages the community to exploremethods to represent small molecule chemistry and predict likely binders usingchemical and protein target structure.
[ West Meeting Room 209 ]
Abstract
The Edge-Device Large Language Model Competition seeks to explore the capabilities and potential of large language models (LLMs) deployed directly on edge devices. The incredible capacity of LLMs makes it extremely tantalizing to be applied to practical edge devices to enable wide applications of LLMs in various disciplines. However, the massive size of LLMs poses significant challenges for edge devices where the computing resources and memory are strictly limited. For instance, deploying a small-scale 10B LLM could require up to 20GB of main memory (DRAM) even after adopting INT8 quantization, which unfortunately has exceeded the memory of most commodity smartphones. Besides, the high energy consumption of LLMs will drain smartphones' battery quickly. To facilitate applications of LLMs in a wide range of practical scenarios, we propose this timely competition to encourage practitioners in both academia and industry to come up with effective solutions for this pressing need. By challenging participants to develop efficient and optimized models that can run on resource-constrained edge devices, the competition aims to address critical economic and environmental issues related to LLMs, foster interdisciplinary research collaborations, and enhance the privacy and security of AI systems.
[ West Meeting Room 210 ]
Abstract
Ensuring safety emerges as a pivotal objective in developing large language models(LLMs) and LLM-powered agents. The Competition for LLM and Agent Safety(CLAS) aims to advance the understanding of the vulnerabilities in LLMs andLLM-powered agents and to encourage methods for improving their safety. Thecompetition features three main tracks linked through the methodology of promptinjection, with tasks designed to amplify societal impact by involving practicaladversarial objectives for different domains. In the Jailbreaking Attack track,participants are challenged to elicit harmful outputs in guardrail LLMs via promptinjection. In the Backdoor Trigger Recovery for Models track, participants aregiven a CodeGen LLM embedded with hundreds of domain-specific backdoors.They are asked to reverse-engineer the trigger for each given target. In the Back-door Trigger Recovery for Agents track, trigger reverse engineering will befocused on eliciting specific backdoor targets based on malicious agent actions. Asthe first competition addressing the safety of both LLMs and LLM agents, CLAS2024 aims to foster collaboration between various communities promoting researchand tools for enhancing the safety of LLMs and real-world AI systems.
[ West Meeting Room 208 ]
Abstract
"Erasing the Invisible" is a pioneering competition designed to rigorously stress-test image watermarks, aiming to enhance their robustness significantly. Its standout feature is the introduction of dual tracks for black-box and beige-box attacks, providing a nuanced approach to validate the reliability and robustness of watermarks under varied conditions of visibility and knowledge. The competition spans from July 18 to October 31, inviting individuals and teams to register and participate in a dynamic challenge. Throughout the competition, employing a dataset of 10k images accessed through the Hugging Face API, competitors will receive updated evaluation results on a rolling basis and submit their refined techniques for the final evaluation, which will be conducted on an extensive dataset of 50k images. The evaluation process of this competition not only emphasizes the effectiveness of watermark removal but also highlights the critical importance of maintaining image quality, with results reflected on a continuously updated leaderboard. "Erasing the Invisible" promises to elevate watermarking technology to new heights of resilience, setting a precedent for future research and application in digital content security and safeguarding against unauthorized use and misinformation in the digital age.