Machine learning, especially large language models (LLMs), has shown promise in proving formal theorems using proof assistants such as Coq, Isabelle, and Lean. Theorem proving is an important challenge for machine learning: Formal proofs are computer programs whose correctness can be verified. Therefore, theorem proving is a form of code generation with rigorous evaluation and no room for the model to hallucinate, opening up a new avenue for addressing LLMs’ flaws in factuality. Despite its potential, learning-based theorem proving has significant entry barriers, primarily due to the steep learning curve for proof assistants. This tutorial aims to bridge this gap and make theorem proving accessible to researchers
with a general machine learning background. To that end, our presentation will con- textualize theorem proving from a machine learning perspective and demonstrate
how to develop LLMs for theorem proving, using newly available open-source
tools that provides interfaces to proof assistants without requiring in-depth knowl- edge of their internals. Furthermore, we will cover advanced topics and open
problems in learning-based theorem proving, including its synergies with natural language processing and software verification. Throughout the presentation, we will highlight several conceptual themes recurring in theorem proving that are also critical for machine learning, such …
The tutorial aims to familiarize the ML community with major existing AI governance frameworks, ongoing AI policy proposals worldwide, and the concrete tools the research community has developed to adhere to standards and regulations applicable to ML systems in socially high-stakes domains. As a concrete governance challenge, we will focus on issues of bias and unfairness and overview pipeline-centric approaches to operationalize algorithmic harm prevention. As we will discuss, this approach is particularly relevant to challenges around leveraging the disparate impact doctrine for algorithmic harm prevention and recent FTC advanced notice of proposed rulemakings (ANPRMs). The concluding expert panel is an opportunity for the ML community to hear diverse perspectives on the key AI governance challenges in the near future and how the ML research community can prepare for and support efforts to address those challenges.
AI desires to imitate human intelligence for designing efficient decision-making systems, but are we really training them the way humans learn every day or take decisions? Studies have shown humans are inherently more comfortable making decisions on a relative scale or choosing alternatives from a set, which often helps us converge to an optimal decision faster. In recent times, as we are employing more and more AI tools for executing everyday tasks, it’s becoming necessary to align machine behavior with human-like decisions. Another critical challenge in training user-friendly systems lies in the requirement of a huge amount of human feedback, which is often costly and hard to obtain. The solution lies in learning to train our machines through human preferences! Our tutorial aims to address the critical need for educating researchers on different types of preference models by exploring real-world problems and showcasing how training systems through preference feedback can provide cutting-edge solutions. We will equip attendees with a comprehensive understanding of diverse preference models and inference techniques. Another goal of the tutorial is to encourage collaboration among various communities that have significant connections to preference-based learning, including bandits, multiagent games, econometrics, social choice theory, RL, optimization, robotics, and more. …
Large, overparameterized models such as neural networks are now the workhorses of modern machine learning. These models are often trained to near-zero error on noisy datasets and simultaneously generalize well to unseen data, in contrast to the textbook intuition regarding the perils of overfitting. At the same time, near-perfect data-fitting can have severe issues in the context of robustness, privacy, and fairness. Classical theoretical frameworks provide little guidance for navigating these questions due to overparameterization. It is thus crucial to develop new intuition regarding overfitting and generalization that are reflective of these empirical observations. In this tutorial, we discuss recent work in the learning theory literature that provides theoretical insights into these phenomena.
As more and more AI systems are deployed in the real world, it becomes im- perative to study these systems with real humans to avoid unexpected negative
consequences during deployment. Yet, this can be challenging for researchers with more experience designing algorithms and less experience running human participant experiments, or deploying systems in the real world. In this tutorial, we will discuss the state of the human-AI collaboration field, emphasizing (i) incorporating humans into AI systems, including multi-agent, machine learning, and reinforcement learning systems, (ii) investigating when to rely on human vs. AI strengths, and (iii) designing human-AI studies to evaluate algorithms with real humans.
Data-Centric AI has recently been raised
as an important paradigm shift in ma- chine learning and AI — placing the previ- ously undervalued “data work’ at the cen- ter of AI development. This tutorial aims
to illuminate the fundamentals of Data- Centric AI and articulate its transforma- tive potential. We will explore the mo- tivation behind the data-centric approach,
highlighting the power to improve model performance, engender more trustworthy, fair and unbiased AI systems, as well as discuss benchmarking from a data-centric perspective. Our examination extends to standardized documentation frameworks, exposing how they form the backbone of this new paradigm. The tutorial will cover
state-of-the-art methodologies that under- score these areas, which we will con- textualize around the high-stakes setting
of healthcare. A focus of this tutorial
is providing participants with an interac- tive and hands-on experience. To this
end, we provide coding/software tools and
resources, thereby enabling practical en- gagement. The panel discussion, with ex- perts spanning diverse industries, will pro- vide a dynamic platform for discourse, en- abling a nuanced understanding of the im- plications and limitations of Data-Centric
AI across different contexts. Ultimately,
our goal is that participants gain a practi- cal foundation in data-centric AI, such …
Tasks enabled by data contribution estimation (DCE) aid model improvement through data improvement. While benchmark DCE evaluation tasks show application across many ML domains, DCE has limited visibility in other research domains that stand to benefit from its use cases. We propose a tutorial on data contribution for machine learning to address this. This tutorial will provide an overview of DCE for machine learning and natural language processing. Following this tutorial, attendees will have gained an understanding of 1) broadly, what questions data contribution estimation aims to answer; 2) the theory and methods that are widely in use within the DCE community that can be applied to broad range of domains; 3) DCE from the perspectives of large language models and privacy.
Peer review is a cornerstone of academic practice across all of scien- tific research. Any problems in peer review negatively affect the well
being and the career trajectories of researchers (particularly early- career individuals), allocation of billions in grant funding, scientific
advancement, and public perception of science. Despite its signifi- cance, peer review is often labeled as “biased,” “corrupt,” “broken,”
and “unscientific.” In this tutorial, we will present a scientific per- spective on systemic challenges and solutions in peer review. We
will unpack several challenges inherent in peer review, drawing from insightful experiments conducted across various fields of research. We will then discuss solutions on two fronts. First, we will provide
an in depth discussion of solutions from the literature based on com- putational tools developed in machine learning and computer sci- ence, that are currently deployed in NeurIPS or other venues. Sec- ond, we will focus on the design and implications of policies, with
an emphasis on making evidence-based policies. Finally, we will
present important open problems which the community will find en- gaging. The tools and skills in our machine learning community are
well suited to address these open problems, which if solved will have a considerable impact …
Diffusion models have emerged as a powerful class of generative models and demonstrated astonishing results, in particular in image synthesis. However, training high-resolution diffusion models in pixel space can be highly expensive. Overcoming these limitations, Latent Diffusion Models (LDMs) first map high-resolution data into a compressed, typically lower-dimensional latent space using an autoencoder, and then train a diffusion model in that latent space more
efficiently. Thereby, LDMs enable high-quality image synthesis while avoiding excessive com- pute demands. The flexible LDM paradigm has become very popular in the literature and has
been successfully extended also to video synthesis, 3D generation, and more. Most prominently,
the state-of-the-art text-to-image model Stable Diffusion, which has attracted worldwide at- tention, leverages the LDM framework. In this tutorial, we aim to provide an introduction to
LDMs. While the literature on diffusion models has become broad, the LDM paradigm stands
out as a particularly powerful approach due to its flexibility and excellent trade-off with re- spect to performance and compute demands. We propose a tutorial on LDMs that will benefit
researchers interested in efficient and flexible, yet expressive generative modeling frameworks. Moreover, a panel discussion with well-known experts will provide diverse perspectives on this dynamic field and …
The success of the Transformer model has pushed the lim- its of deep learning to operate on the scale of trillions of
parameters. This proliferation of large model size has out- paced the advances in hardware, resulting in an urgent need
to distribute enormous models across multiple GPUs. De- spite this trend, best practices for choosing an optimal strat- egy are still lacking due to the breadth of knowledge re- quired across both deep learning and parallel computing.
This drives researchers to question deeply about: How to im- prove the training and inference efficiency of large models
to reduce costs? Can we accommodate larger models with limited resources? What efforts can we make to enable more AI community members to access big models easily? In this tutorial, we investigate the efforts to solving above problems. A diverse set of parallelism is an important tool to
improving the efficiency of large model training and infer- ence. Heterogeneous memory management can enhance the
model accommodation capacity of processors (e.g. GPUs).
Further, deep learning systems for large AI models will sig- nificantly reduce the specialized background knowledge re- quired from users, allowing AI users to quickly get started
with larger models. We believe …
Large matrices arise in many ML applications, including as representations of datasets, graphs, model weights, first and second-order derivatives, etc. Randomized Numerical Linear Algebra (RandNLA) is an area which uses randomness to develop improved algorithms for ubiquitous matrix problems. The area has reached a certain level of maturity, and current efforts of incorporating RandNLA algorithms into core numerical libraries, as well as recent advances in ML, Statistics, and Random Matrix Theory, have lead to new theoretical and pratical challenges. This tutorial will provide a self-contained overview of RandNLA, in light of these important developments.
Large language models (LMs) have achieved remarkable success in many language tasks. Recent works have also shown that knowledge of the world can emerge from large LMs [15], enabling large LMs to assist decision-making for embodied tasks [1, 7, 10, 14, 8]. However, the world knowledge exhibited by the current large LMs is often not robust and cannot be grounded in the physical environments without additional models. This hinders large LMs’ abilities to perform complex reasoning and planning tasks reliably. For example, in creating action plans to move blocks to a target state, GPT-3 achieves a success rate of only 1%, compared to 78% for humans [16].
On the other hand, humans perform deliberate reasoning and planning based on the men- tal model of the world (i.e., world model, WMs) that enables us to simulate actions and
their effects on the world’s state [2, 5, 3]. WMs encoding the knowledge of the physi- cal world can drastically improve the data efficiency and robustness of intelligent agents.
However, WMs were typically studied in reinforcement learning and robotics, which are conceptually distinct from problems studied in language modeling.
This gap indicates enormous new opportunities for connecting WMs and LMs, to en- hance …
Data heterogeneity is a key determinant of the performance of ML systems. Standard algorithms that optimize for average-case performance do not consider the presence of
diversity within data. As a result, variations in data sources, data generation mecha- nisms, and sub-populations lead to unreliable decision-making, poor generalization,
unfairness, and false scientific discoveries. Carefully modeling data heterogeneity is a necessary step in building reliable data-driven systems. Its rigorous study is a nascent field of research spanning several disciplines, including statistics, causal inference, machine learning, economics, and operations research. In this tutorial, we develop a unified view of the disparate intellectual threads developed by different communities. We aim to foster interdisciplinary research by providing a unified view based on a shared language. Drawing upon several separate literatures, we establish a taxonomy of heterogeneity and present quantitative measures and learning algorithms that consider heterogeneous data. To spur empirical progress, we conclude by discussing validation protocols and benchmarking practices.
The rise of large language models (LLMs) offers a new approach for quickly building AI applications. While LLMs such as ChatGPT, Bard, and Bing chat are widely understood as consumer tools, the best practices for developers to effectively use these models through API calls remain poorly understood. This tutorial will share with the NeurIPS audience best practices for building AI applications using LLMs. This course will include, but also go significantly beyond, “prompt engineering.” We will share best practices for integrating LLMs into more complex software systems, evaluating and continually improving their performance, and enhancing their safety. We will discuss best practices for using LLMs in common operations such as summarizing, making inferences, transforming text, and expanding text, as well as in-context learning, fine-tuning, and the utilization of both open-source and proprietary cloud-hosted LLMs. LLMs are transforming the development process of AI applications. For example, a sentiment classifier that used to take weeks to build, via a process of collecting and labeling training examples, tuning a supervised model, and then finally deploying the model to make inferences, can now be built in hours by prompting an LLM API. Through this tutorial, we hope to connect research and practice, and also …