TIPDAT: ML based Optimization Explainer

Mon 28 Nov 7:30 a.m. - 8:30 a.m. PST
Expo Talk Panel

Amazon’s Retail Inventory is the outcome of numerous supply-chain systems that optimize stochastic input variables to fulfill customer demand and maximize profit, while being subject to a variety of internal (labor, storage) and external constraints. Historically, we’ve relied on week-long manual subjective deep-dives that rely on disconnected metrics to inspect supply-chain health, identify defects, and improve supply chain outcomes. These methods have proved to not be scalable and even disconnected from actual supply-chain behavior. Therefore, we developed an automated solution that can connect all these supply-chain systems and inputs to inventory. While the opportunity is clear, building an inventory attribution system that works effectively at the scale of Amazon’s supply chain is non-trivial. To solve for this, we developed a two staged algorithm that we discuss here. Stage 1 of the algorithm trains a large-scale ML model over a billion observations to approximate a complex stochastic programming algorithm that Amazon uses to make buying decisions. We developed a novel attribution algorithm that leverages the concept from Shapley values in game theory. The attributes from the algorithm not only follows efficiency, symmetry, linearity, and null player properties, it also jointly attributes to the variables in the case when variables are highly dependent and independent attribution is not desirable. This algorithm is used in production within Amazon to attribute impact of input changes in the buying system towards Amazon retail inventory changes and is used in everyday operations to drive inventory related actions.

Join Virtual Talk & Panel Visit Amazon Science Booth

End-to-end cloud-based Document Intelligence Architecture using the open-source Feathr Feature Store, the SynapseML Spark library, and Hugging Face Extractive Question Answering

Mon 28 Nov 7:30 a.m. - 8:30 a.m. PST
Expo Talk Panel

Cloud-based data platforms, applied AI services, and open-source NLP models can be used together to extract information from business documents such as leases in file formats such as PDF and TIF. We highlight two Microsoft open-source projects, the Azure Feature Store (Feathr) and the SynapseML Spark library, and we employ a state-of-the-art Natural Language Processing model to perform Extractive Question Answering, allowing us to extract information simply by supplying a question such as “Who is the Landlord?” Via this illustrative example, attendees will gain an understanding of a powerful and broadly applicable set of tools for high-scale processing of unstructured data and online serving of ML-powered data products.

Join Virtual Talk & Panel Visit Microsoft Booth

Challenges & Opportunities for Ethical AI in Practice

Mon 28 Nov 8:30 a.m. - 9:30 a.m. PST
Expo Talk Panel

In recent years, there has been a growing awareness of the need to consider broader societal impacts when developing and deploying AI models. Research areas like algorithmic fairness, explainability, safety, robustness, and trustworthiness have contributed significantly to our understanding of possible approaches for developing more responsible and ethical AI. Despite these research advances, however, there remain significant challenges to operationalizing such approaches in practice. This talk will discuss technical, legal, and operational challenges that practitioners face when attempting to address issues of bias and lack of transparency in their models. These include tensions between multiple ethical desiderata like fairness and privacy, difficulties of large-scale ethical data collection, and challenges of balancing scalability and bespoke evaluation when designing compliance systems. This talk will also share some of Sony’s approaches for addressing these challenges.

Join Virtual Talk & Panel Visit Sony Booth

Machine Learning and Optimization for Automated Trading at HRT

Mon 28 Nov 8:30 a.m. - 9:30 a.m. PST
Expo Talk Panel

"Hudson River Trading (HRT) is a quantitative automated trading company that trades hundreds of millions of shares each day broken up into over a million trades and spread across thousands of symbols. It trades on over 200 markets worldwide, and accounts for around 10% of US equities volume. To provide price discovery and market making services for public markets, HRT employs state-of-the-art techniques from machine learning and optimization to understand and react to market data.

In this talk we will provide an overview of the unique challenges in this domain and the breadth of techniques employed at HRT. A fundamental challenge is the massive, heterogeneous, unevenly spaced, noisy, and bursty nature of financial datasets. Researchers at HRT use tools like multi-task learning, sequence modeling, and large language models to build some of the most predictive models in the world for these datasets. Given strong predictions about the future prices of financial products, HRT employs a variety of optimization techniques spanning from Bayesian optimization to quasi-newton methods to portfolio optimization to make trading decisions. Come to learn more about opportunities to make an impact in this fast paced and competitive industry."

Join Virtual Talk & Panel Visit Hudson River Trading Booth

Uncertainty quantification for fair and transparent AI-assisted decision-making

Mon 28 Nov 9:30 a.m. - 10:30 a.m. PST
Expo Talk Panel

Decision-makers can reject model forecasts and steer clear of expensive blunders by being aware of when an AI model is unsure about its predictions. In general, one anticipates a prediction model’s performance to improve at the expense of lessening coverage when a reject option is made available (i.e., by predicting on fewer samples). However, such an enhancement might not be seen by all the data’s subpopulations and might even have negative effects on some of them. We’ll cover techniques to make selective classification [1] and regression [2] effective for everyone in this talk as well as current developments in trustworthy uncertainty quantification. The use of generative models to inform decision-makers about the areas of high and low confidence in AI will also be covered in this session. We will examine a few sample cases in-depth using an open-source toolkit called UQ360 (https://github.com/IBM/UQ360) to demonstrate how uncertainty relates to other principles of trustworthy AI.

Join Virtual Talk & Panel Visit IBM Booth

Using AI for Reduced-Order Modeling

Mon 28 Nov 9:30 a.m. - 10:30 a.m. PST
Expo Talk Panel

"Engineers often start modeling components of their system using first principles. The real value of a first-principles model is that results typically have a clear, explainable physical meaning. In addition, behaviors can often be parameterized. However, large-scale, high-fidelity nonlinear models can take hours or even days to simulate. In fact, system analysis and design may require thousands or hundreds of thousands of simulations to obtain meaningful results, causing a significant computational challenge for many engineering teams. Moreover, linearizing complex models can result in high-fidelity models that do not contribute to the dynamics of interest in your application. In such situations, you can use reduced order models to significantly speed up simulations and analysis of higher-order large-scale systems.

In this talk, you will learn how to speed up the simulation of a complex model (a vehicle engine) by replacing a high-fidelity model with an AI-based reduced-order model (ROM). These models may be trained in the AI Framework of your choice, including PyTorch, TensorFlow or MATLAB. By performing a thorough Design of Experiments (DoE), you will be able to obtain input-output data from the original high-fidelity first-principles model to construct an AI-based ROM that accurately represents the underlying system. You will see how different approaches may be explored, such as stacked LSTMs, Neural ODEs or Non-Linear ARX models. If your goal is to test the design and performance of the other components in your system, you may want to run the components you are designing on the target hardware and run the AI model on a real-time computer, instead of the original high-fidelity model. Once developed, the AI model is modular and reusable. Your colleagues, whether local or in other locations, can also use your AI model in their simulations and component tests, potentially accelerating parallel design and development of the overall system."

Join Virtual Talk & Panel Visit MathWorks Booth

Understanding the Landscape of the latest Large Models - Virtual

Mon 28 Nov 11 a.m. - noon PST
Expo Talk Panel

There seems to be a new large ML model grabbing headlines every week. Whether it's OpenAI's big releases like GPT-3, Dalle-2 or Whisper, or one of the many open source projects generating state-of-the-art models, like Stable Diffusion, OpenFold or Craiyon, these models have found their way into the mainstream. We will map the landscape for you and share how these teams use W&B to accelerate their work.

Watch the video

Join Virtual Talk & Panel Visit Weights and Biases, Inc. Booth

Sparse annotation strategies at scale

Mon 28 Nov noon - 1 p.m. PST
Expo Talk Panel

"Despite an ample supply of data annotation service providers, their services cannot readily be applied in many industries and domains. Reasons are manifold, including privacy concerns, a requirement for deep domain expertise, or the literal absence of data samples for annotation. To address these challenges, a sprawling research field has developed a plethora of sparse annotation strategies, including active learning, unsupervised and semi-supervised learning, and approaches that rely on synthetic data. In this talk, we will discuss two projects that relied on these approaches. First, we will discuss training an adaptive 3d object detection model using synthetic data, for detecting rarely confiscated prohibited items in 3D CT X-ray images of passenger luggage. Second, we will discuss an application of active transfer learning to training a face recognition model for identifying individual chimpanzees in wild-life footage. In the context of these projects, will provide a brief overview of related research and other applications of these approaches. We will also highlight the specific technical and procedural challenges we faced and offer actionable insights and best practices.

Join Virtual Talk & Panel Visit Microsoft Booth

Towards learning agents for solving complex real-world tasks

Mon 28 Nov noon - 1 p.m. PST
Expo Talk Panel

Over the recent years, there has been tremendous progress in deep learning for many fields of AI, such as visual perception, speech recognition, language understanding, and robotics. However, many of these methods require large amounts of supervision and do not generalize well for unseen and complex real-world tasks. By overcoming these challenges, we aim to develop a more general-purpose artificial intelligence agent that can perform many useful tasks for humans with high sample efficiency and strong generalization abilities to previously unseen tasks. We will present ongoing research at LG AI Research on tackling some of these challenges. First, we present methods for learning AI agents to sequentially perform simpler low-level subtasks by combining them. Specifically, our method can learn dependencies between subtasks through experience, which allows the agent to efficiently plan and execute complex tasks in unseen context/scenarios. We further advance this framework by investigating how to expand existing knowledge to new objects/entities, how to efficiently adapt from previous related tasks to new tasks, and how to incorporate large language models and multimodal learning to improve generalization and learning efficiency. These methods are demonstrated on learning agents in 3d simulated environments, game playing, and web navigation. Second, we will describe our EXAONE project (EXpert Ai for everyONE) which aim to integrate large-scale language models and multimodal generative models for developing expert-level AI for various vertical applications with high learning efficiency. More concretely, we will present our ongoing projects powered by EXAONE for developing various real-world applications, such as AI customer center, creative collaborations on arts, and deep document understanding.

Join Virtual Talk & Panel Visit LG AI Research Booth

Generative Understanding of 3D Scenes

Mon 28 Nov 1 p.m. - 2 p.m. PST
Expo Talk Panel

Generative Modeling is at the core of recent advancements in the machine learning field (eg. GPT-3, Stable Diffusion, AlphaFold, etc.). From language to images, progress in generative modeling has grown exponentially in past years. In this talk, we will present Apple’s advancements towards learning generative models of the 3D world. Specifically, we will provide deep technical insights of two recent Apple papers for 3D generative modeling of scenes, as well as an overview of how structured representations of the 3D world make its way into Apple products.

Join Virtual Talk & Panel Visit Apple Booth

Adapt and Optimize ML Models for Hardware-Aware AI

Mon 28 Nov 1 p.m. - 2 p.m. PST
Expo Talk Panel

"AI is becoming ubiquitous, powering everything from self-driving cars to nearly every application on mobile devices. However, many ML models are not well-suited for edge devices because they were trained in the cloud and do not fit those devices well. Such a mismatch greatly hinders the potential of AI and has been a universal barrier for companies deploying powerful ML models to resource-constrained devices at the edge.

In this talk, OmniML will discuss the challenges of adapting, optimizing, and deploying machine learning models on resource-constrained devices at the edge. We will go over industry use cases that OmniML has encountered working with customers in electric vehicle manufacturing, advanced driver assistance systems (ADAS), robotics, IoT smart cameras, and other verticals. We will explore some of the problems customers encounter when trying to fit advanced computer vision onto these edge devices and how OmniML is using our Omnimizer MLOps platform to help customers solve their current pain points of adapting ML models to fit their business needs."

Join Virtual Talk & Panel Visit OmniML Inc. Booth

Human-in-the-Loop Is Here to Stay

Mon 28 Nov 2 p.m. - 3 p.m. PST
Expo Talk Panel

Modern ML systems rely on pre-trained and fine-tuned models that achieve state-of-the-art results without the use of specialized training datasets.

As these datasets are expensive and often difficult to obtain, building on such general models enables quickier prototyping and reasonable prediction quality.

However, when such a model is deployed in production, it starts affecting real users, making it prone to data drift and lack of specificity.

In this talk, Fedor Zhdanov, Head of ML Projects at Toloka, a global tech company that supports data-related processes across the entire ML lifecycle, will discuss how HITL techniques can address these two deficiencies of ML.

Fedor will start his talk introducing adaptive ML models, a new HITL product that enables hosting a model as an endpoint with a crowdsourced curation. This makes it possible to catch the data drifts and perform the model retraining in a way that is automatically tailored to the needs of each customer by gathering the real human feedback from Toloka’s global crowd.

Then, he will present the recent academic results of the Toloka team, which are focused on subjective and noisy labeling. First, Fedor will overview the problem of crowdsourced audio transcriptions and show the lessons learned at the audio transcription shared task at VLDB 2021 and the CrowdSpeech benchmark for noisy sequence aggregation. Second, he will present a problem of learning from subjective data on the example of the IMDB-WIKI-SbS benchmark featured at Data-Centric AI workshop at NeurIPS 2021. Finally, he will exhibit: data clustering with crowdsourcing, reinforcement learning without reward engineering using crowdsourcing, and human evaluation of stable diffusion text-to-image models.

Join Virtual Talk & Panel Visit Toloka AI Booth

Integrating modern machine learning and single cell technologies into drug target discovery - lessons from the frontline.

Mon 28 Nov 2 p.m. - 3 p.m. PST
Expo Talk Panel

Much has been said about the various applications of machine learning to drug discovery and the potential benefits. Yet despite ten years of hype, the true value remains hidden. In this talk, Lindsay Edwards (former VP of ML at both GSK and AZ and now CTO at Relation Therapeutics) discusses what may have gone wrong, and what the path forward may look like. This will include specific examples of applications of machine learning to both biology and chemistry, and some recent results from work at Relation.

Join Virtual Talk & Panel Visit Relation Therapeutics Booth

Get ready, your Jupyter Notebook goes to production!

Mon 28 Nov 7:30 a.m. - 10:30 a.m. PST
Expo Workshop

Why do researchers need to deploy their machine learning (ML) models? What is the difference between the models prototyped by researchers and the ones running in production? According to rough estimations, around 87% of data science projects never make it into production (1). One of the reasons is a difference between a skill set needed for designing a new model and a skill set for deploying this model in production for the end users. The former includes the ability to come up with new ideas and prototype them fast, while the latter focuses on stability, scalability and, importantly, integration with the existing processes. Thus, training and deploying machine learning models becomes a major challenge for many companies, either big or small. In this workshop, we will focus on key challenges that most researchers have to overcome on a way to production. We will start with a panel discussion about different perspectives on how research findings should be used in production. Then, working together in groups, we will discuss common steps of a pipeline of implementing a ML model, such as collecting the data, exploratory data analysis, feature engineering, model selection, model deployment, and model serving. Our intent is to form these groups in a mixed way, connecting academic researchers with industrial software engineers. In this way, the participants will be able to share their experience in optimizing each step of the ML pipeline and exchange best practices. We will conclude with a discussion on utilizing research findings for making better products and applications.

KEY TAKE-AWAYS
- What components are needed to bring research models Into production
- How you can set up reusable and easy-to-upgrade pipelines
- How to optimize each step of ML pipeline

Join Virtual Workshop Visit Toloka AI Booth

Graph Neural Networks in Tensorflow: A Practical Guide

Mon 28 Nov 7:30 a.m. - 10:25 a.m. PST
Expo Workshop

Tutorial Website is here!

Google Meet Link: https://meet.google.com/yko-cpuk-czg

Slides: https://drive.google.com/file/d/1ECcnRgJqjmj7hlegYuPdscLCUw7YJc7G/view?usp=sharing

Motivation

Graphs are general data structures that can represent information from a variety of domains (social, biomedical, online transactions, and many more). Graph Neural Networks (GNNs) are an exciting way to use graph structured data inside neural network models that have recently exploded in popularity. However, implementing GNNs and running GNNs on large (and complex) datasets still raises a number of challenges for machine learning platforms.

Goals

The main goal of this tutorial is to help practitioners and researchers to implement GNNs in a TensorFlow setting. Specifically, the tutorial will be mostly hands-on, and will walk the audience through a process of running existing GNNs on heterogeneous graph data, and a tour of how to implement new GNN models. The hands-on portion of the tutorial will be based on TF-GNN, a new framework that we open-sourced.

Learning objectives
1. Conceptual understanding of Graph Neural Networks (GNNs).
2. Hands-on: How to train and evaluate GNNs in TensorFlow, using TF-GNN.
3. Understanding of message passing building blocks for crafting advanced GNN architectures.
4. Hands-on: How to implement custom models inside TF-GNN.
5. Know how to run TF-GNN models at scale, using cloud environments.
6. Hands-on: Run TF-GNN at scale on large graphs.

Structure

This tutorial consists of 3 lectures, paired with 3 python notebooks, which cover different aspects of working with TF-GNN.

- 9:30 AM. Basics of TF-GNN
- 10:30 AM. Modeling with TF-GNN
- 11:30 AM. Scaling GNNs w/ TF-GNN

Material

ALL Presentation Slides

The notebooks (and recordings eventually) can be found on the tutorial website

Additional Resources

If you're interested in learning more about GNNs or TF-GNN, we recommend the following resources:

- Our paper TF-GNN: Graph Neural Networks in TensorFlow, details the API design and background of the library.
- The in-depth notebook OGBN-MAG end-to-end with TF-GNN offers a deep dive on building heterogeneous graph models using TF-GNN.

Join Virtual Workshop Visit Google LLC Booth

Fine-tuning stable diffusion models: massive creativity without massive bills

Mon 28 Nov 7:30 a.m. - 10:30 a.m. PST
Expo Workshop

You'll hear our team's story of building and launching text-to-pokemon, a fine-tuned stable diffusion model that has generated millions pokemon-like images. And to top it off, it only cost $$10 to train.
Join the hands-on active training and we'll walk you through, step by step, how to fine-tune stable diffusion to generate unique target-specific results like text-to-pokemon and DreamBooth. To get the most out of this training, we suggest that you already come with a basic understanding of, and familiarity with, training or fine-tuning models using PyTorch or TensorFlow. We'll then show you how to get cost-efficient cloud GPU resources and walk you through the steps of fine-tuning the models.
Active Training Contributors:
Justin Pinkney [Contributing] - Senior Machine Learning Researcher - Lambda - Creator of the text-to-pokemon fine tuned stable diffusion model.
Chuan Li [Attending] - Chief Scientific Officer - Lambda - Creator of CNNMRF and Co-author of Hologan.
Stephen Balaban [Attending] - CEO - Lambda - Co-founder of Lambda.

Join Virtual Workshop Visit Lambda, Inc. Booth

PyTorch: New advances for large-scale training and performance optimizations

Mon 28 Nov 7:30 a.m. - 10:30 a.m. PST
Expo Workshop

[New Zoom Link] (https://fb.zoom.us/j/94549557101?pwd=NXBKcmluVUM1SU5LZ0N2S2xuL2c2dz09)

Large language models and Generative AI have been key drivers for new innovations in large-scale training and performance optimizations. In this workshop, we will dive deeper into new features and solutions in PyTorch that enable training and performance optimizations @ scale.

Following topics will be covered by the PyTorch team in this workshop. The sessions are divided over two days, Nov 28th will cover the PyTorch Distributed and Profiling topics, and Dec: 5th session will cover the PyTorch Compiler based solutions.

## Part 1: Nov 28 (Hybrid, in-person and remote), 9:30a-12:30p CST (UTC-6), Room # 291
-------------------------------------------------------------------------------------------------------

1. FSDP Production Readiness, Speakers: Rohan Varma, Andrew Gu
We will dive deep into recent advances in FSDP which have enabled better throughput, memory savings and extensibility. These improvements have unblocked using FSDP for models of different modalities and varying sizes(model and data). We will share best practices to apply these features to specific use cases such as XLMR, FLAVA, ViT, DHEN and GPT3 style models.

2. Automated Pipeline Parallelism for PyTorch, Speaker: Ke Wen
PiPPy is a library that provides automated pipeline parallelism for PyTorch models. PiPPy consists of a compiler stack capable of automatically splitting a model into stages without requiring intrusive code changes to the model. It also provides a distributed runtime that helps users to distribute the split stages to multiple devices and multiple hosts and orchestrates micro-batch execution in an overlapped fashion. We are going to demonstrate the use of PiPPy for Hugging Face models on clouds.

3. PyTorch Profiler, Speaker: Taylor Robie
Dive into recent enhancements to the PyTorch profiler capabilities, Python function tracing, data flow capture, and memory profiling, and how they enable previously impossible performance analysis.

4. Profiling Distributed Training Workloads, Speaker: Anupam Bhatnagar
We will present Holistic Trace Analysis (HTA), a tool to identify computation, communication and memory bottlenecks in distributed training. HTA identifies these bottlenecks by analyzing the traces collected using the PyTorch Profiler.

5. TorchBench, Speaker: Xu Zhao
In this talk we present PyTorch Benchmark(TorchBench), a benchmarking suite to provide quick and stable performance signals to hold the line of performance in PyTorch development. TorchBench identifies performance regressions and provides CI services for PyTorch developers to test their PRs. It can also be used to profile specific models and identify optimization opportunities.

## Part 2: Dec 7 (Virtual), 9:30a - 11:30a PST (UTC-8) / 11:30a - 1:30p CST (UTC-6)
------------------------------------------------------------------------------------------------

Focus on the new PyTorch Compiler features (https://pytorch.org/get-started/pytorch-2.0/)

6. A deep dive into TorchDynamo, Speaker: Animesh Jain
This talk presents a deep dive into TorchDynamo. TorchDynamo is a Python-level JIT compiler designed to make unmodified PyTorch programs faster. It rewrites Python bytecode in order to extract sequences of PyTorch operations into a graph which is then just-in-time compiled with a customizable backend. It is designed to mix Python execution with compiled backends to get the best of both worlds: usability and performance

7. A deep dive into TorchInductor, Speakers: Bin Bao, Natalia Gimelshein
This talk presents a deep dive into the design principles of TorchInductor, pytorch compiler backend, the lowering stack that it uses to transform pytorch programs, and the optimization techniques and codegen technologies that it uses.

8: How do backends integrate to PyTorch compiler stack, Speaker: Sherlock Huang
This talk deep dives into the backend integration points in Pytorch compiler stack. It will explain three types of IR used across the stack, torch IR produced by Dynamo, AtenIR produced by AoTAutograd, and loop-level IR used in Inductor. It will introduce the infrastructure and utilities available for backend integration, including a IR-agnostic Pattern Matcher and a Graph Partitioner.

Join Virtual Workshop Visit Meta Platforms, Inc. Booth

Impactful graph neural networks via DGL: A Tale of Research and Productization

Mon 28 Nov noon - 3 p.m. PST
Expo Workshop

Graph neural networks (GNNs) learn from complex graph data and have been remarkably successful in various applications and across industries. Furthering the impact of GNNs entails solving challenges related to modeling and scalability research and productionization. Impactful GNN research requires constant innovation to handle rich, time-evolving, and heterogenous graph data as well as trillion-edge scale graphs. We develop GNN models and distributed training techniques to handle such challenges and integrate those into the deep graph library (DGL). DGL is a scalable and widely adopted library for developing GNN models. Building GNN products requires domain expertise and significant effort. At AWS we aim at lowering the bar in productionizing graph machine learning (GML). Neptune ML facilitates this goal and helps customers obtain real-time GNN predictions with graph databases using graph query languages. At Amazon and AWS we develop frameworks based on DGL to solve internal and external GML problems and realize the impact of GNNs.

Join Virtual Workshop Visit Amazon Science Booth

Intro to TensorFlow and JAX

Mon 28 Nov noon - 3 p.m. PST
Expo Workshop

This workshop is a good fit for you if you’re a student or researcher relatively new to TensorFlow or JAX (or you're curious about what's new!), and you’d like to learn more about Google’s open-source tools.

The workshop will be divided into two sections. Each section will include a presentation about the library followed by examples you can try to illustrate key points.

We'll start with TensorFlow, and cover new features in TensorFlow 2.10 and 2.11. We'll discuss plans for future iterations of the library, and then we'll explore your options as a researcher. You'll learn about TensorFlow's core APIs from the ground up, and then explore Keras with a progressive disclosure of complexity, from simple sequential APIs, to model subclassing and custom training loops for full control.

Next, you'll learn how to get started with JAX. JAX is a high-performance library for machine learning research, with a familiar NumPy API. In this section of the workshop, you’ll learn about JAX as accelerated NumPy, and explore features like Just in Time Compilation, Automatic Vectorization, and Parallelism.

The presenters will be available to chat with you 1:1 after to learn more about your work and how Google’s tools can help.

Please bring a laptop. There is nothing to install in advance. Thank you!

Join Virtual Workshop Visit Google LLC Booth

AutoGluon: Empowering (MultiModal) AutoML for the next 10 Million users

Mon 28 Nov noon - 3 p.m. PST
Expo Workshop

Automated machine learning (AutoML) offers the promise of translating raw data into accurate predictions without the need for significant human effort, expertise, and manual experimentation. In this workshop, we introduce AutoGluon, a state-of-the-art and easy-to-use toolkit that empowers multimodal AutoML. Different from most AutoML systems that focus on solving tabular tasks containing categorical and numerical features, we consider supervised learning tasks on various types of data including tabular features, text, image, time series, as well as their combinations. We will introduce the real-world problems that AutoGluon can help you solve within three lines of code and the fundamental techniques adopted in the toolkit. Rather than diving deep into the mechanisms underlining each individual ML models, we emphasize on how you can take advantage of a diverse collection of models to build an automated ML pipeline. Our workshop will also emphasize on the techniques behind automatically building and training deep learning models, which are powerful yet cumbersome to manage manually.

Check workshop website: https://autogluon.github.io/neurips2022-autogluon-workshop/

Join Virtual Workshop Visit Amazon Science Booth

PyTorch: New advances for large-scale training and performance optimizations

Wed 7 Dec 9:30 a.m. - 11:30 a.m. PST
Expo Workshop

New Zoom Link

Large language models and Generative AI have been key drivers for new innovations in large-scale training and performance optimizations. In this workshop, we will dive deeper into new features and solutions in PyTorch that enable training and performance optimizations @ scale.

Following topics will be covered by the PyTorch team in this workshop. The sessions are divided over two days, Nov 28th will cover the PyTorch Distributed and Profiling topics, and Dec: 5th session will cover the PyTorch Compiler based solutions.

## Part 1: Nov 28 (Hybrid, in-person and remote), 9:30a-12:30p CST (UTC-6), Room # 291
-------------------------------------------------------------------------------------------------------

1. FSDP Production Readiness, Speakers: Rohan Varma, Andrew Gu
We will dive deep into recent advances in FSDP which have enabled better throughput, memory savings and extensibility. These improvements have unblocked using FSDP for models of different modalities and varying sizes(model and data). We will share best practices to apply these features to specific use cases such as XLMR, FLAVA, ViT, DHEN and GPT3 style models.

2. Automated Pipeline Parallelism for PyTorch, Speaker: Ke Wen
PiPPy is a library that provides automated pipeline parallelism for PyTorch models. PiPPy consists of a compiler stack capable of automatically splitting a model into stages without requiring intrusive code changes to the model. It also provides a distributed runtime that helps users to distribute the split stages to multiple devices and multiple hosts and orchestrates micro-batch execution in an overlapped fashion. We are going to demonstrate the use of PiPPy for Hugging Face models on clouds.

3. PyTorch Profiler, Speaker: Taylor Robie
Dive into recent enhancements to the PyTorch profiler capabilities, Python function tracing, data flow capture, and memory profiling, and how they enable previously impossible performance analysis.

4. Profiling Distributed Training Workloads, Speaker: Anupam Bhatnagar
We will present Holistic Trace Analysis (HTA), a tool to identify computation, communication and memory bottlenecks in distributed training. HTA identifies these bottlenecks by analyzing the traces collected using the PyTorch Profiler.

5. TorchBench, Speaker: Xu Zhao
In this talk we present PyTorch Benchmark(TorchBench), a benchmarking suite to provide quick and stable performance signals to hold the line of performance in PyTorch development. TorchBench identifies performance regressions and provides CI services for PyTorch developers to test their PRs. It can also be used to profile specific models and identify optimization opportunities.

## Part 2: Dec 5 (Virtual), 9:30a - 11:30a PST (UTC-8) / 11:30a - 1:30p CST (UTC-6)
------------------------------------------------------------------------------------------------

6. A deep dive into TorchDynamo, Speaker: Animesh Jain
This talk presents a deep dive into TorchDynamo. TorchDynamo is a Python-level JIT compiler designed to make unmodified PyTorch programs faster. It rewrites Python bytecode in order to extract sequences of PyTorch operations into a graph which is then just-in-time compiled with a customizable backend. It is designed to mix Python execution with compiled backends to get the best of both worlds: usability and performance

7. A deep dive into TorchInductor, Speakers: Bin Bao, Natalia Gimelshein
This talk presents a deep dive into the design principles of TorchInductor, pytorch compiler backend, the lowering stack that it uses to transform pytorch programs, and the optimization techniques and codegen technologies that it uses.

8: How do backends integrate to PyTorch compiler stack, Speaker: Sherlock Huang
This talk deep dives into the backend integration points in Pytorch compiler stack. It will explain three types of IR used across the stack, torch IR produced by Dynamo, AtenIR produced by AoTAutograd, and loop-level IR used in Inductor. It will introduce the infrastructure and utilities available for backend integration, including a IR-agnostic Pattern Matcher and a Graph Partitioner.

Join Virtual Workshop Visit Meta Platforms, Inc. Booth

DGL: Impactful graph neural networks: A Tale of Research and Productionization

Expo Workshop

Graph neural networks (GNNs) learn from complex graph data and have been remarkably successful in various applications and across industries. Furthering the impact of GNNs entails solving challenges related to modeling and scalability research and productionization. Impactful GNN research requires constant innovation to handle rich, time-evolving, and heterogenous graph data as well as trillion-edge scale graphs. We develop GNN models and distributed training techniques to handle such challenges and integrate those into the deep graph library (DGL). DGL is a scalable and widely adopted library for developing GNN models. Building GNN products requires domain expertise and significant effort. At AWS we aim at lowering the bar in productionizing graph machine learning (GML). Neptune ML facilitates this goal and helps customers obtain real-time GNN predictions with graph databases using graph query languages. At Amazon and AWS we develop frameworks based on DGL to solve internal and external GML problems and realize the impact of GNNs. Bio: Vassilis N. Ioannidis is an Applied Scientist in AWS AI Research and Education (AIRE). He received his Ph.D. degree for his dissertation in “Robust Deep Learning on Graphs” from the University of Minnesota (UMN), Twin Cities, Minneapolis, MN, USA, in 2020. He also received the Doctoral Dissertation Fellowship as a recognition for his graph representation learning research. Vassilis has published more than 40 conference and journal papers. He worked from June to December 2019 at Mitsubishi Electric research labs in graph representation learning. Since February 2020, he is working at AWS in the deep graph library team, where he develops GNN solutions and performs research in GNNs. He worked on developing Neptune ML which is a machine learning service over graph databases deployed in AWS using GNNs. He also works in large-scale training of superposition of language models and GNNs for Amazon projects in information retrieval, recommendation, and abuse detection.

Join Virtual Workshop

Build Better Models Faster with W&B

Mon 28 Nov 8 a.m. - 10 a.m. PST
Expo Demonstration

"As the global machine learning community grows and teams become more distributed, collaboration becomes more challenging. Our ML engineer at Weights and Biases (W&B) will show you how to use W&B, an experiment and data tracking platform, to centralize all of the information related to a given ML project while making it easy to share and communicate findings with other colleagues. No matter where you’re storing your data or executing computation, W&B will let you quickly track the entire process, from raw data all the way through your final model. Let’s make our hard work organized and reproducible!"

Join Virtual Demonstration Visit Weights and Biases, Inc. Booth

Conditional Compute for On-device Video Understanding

Mon 28 Nov 8 a.m. - 10 a.m. PST
Expo Demonstration

"In this demo, we present a conditional early-exiting framework for efficient on-device video understanding. The proposed method is based on our recently published work, FrameExit [1], which automatically learns to process fewer frames for simpler videos and more frames for complex ones. Our model sequentially observes sampled frames from a video up to the current time step and uses a gating module to automatically determine the earliest exiting point in processing where an inference is sufficiently reliable. To enable the execution of the model on-device, we use state-of-the-art quantization techniques from the open-sourced AI Model Efficiency Toolkit and a novel compiler stack that supports models with dynamic inference graphs. Our model outperforms competing methods on the HVU benchmark and on average enables a 4X reduction in compute and latency at comparable accuracy.

[1] Ghodrati, Amir, Babak Ehteshami Bejnordi, and Amirhossein Habibian. ""Frameexit: Conditional early exiting for efficient video recognition."" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021."

Join Virtual Demonstration Visit Qualcomm Booth

Practical Deployment of Secure Federated Learning: Challenges, Opportunities and Solutions

Mon 28 Nov 8 a.m. - 10 a.m. PST
Expo Demonstration

Federated Learning (FL) is an emerging machine learning approach that allows multiple entities (a.k.a. parties or clients) to collaboratively train a machine learning model under the coordination of an aggregator (a.k.a server) without directly revealing their private training data. Due to privacy or regulatory constraints, many industry sectors, for example, healthcare, banking and retailers, have growing interests in employing FL to facilitate model training across multiple data centers. However, employing the basic version of federated learning usually is not good enough to meet privacy needs. In particular, not directly sharing raw training data does not guarantee full privacy protection, as there are lots of attacks on model parameters to learn sensitive information about training data, such as membership inference attacks, data reconstruction attacks, and property testing attacks, etc. These existing threats call for cryptographic techniques to further protect FL systems. In this demo, we will walk through the importance of protecting FL with cryptographic techniques, discuss high-level basics of fully homomorphic encryption (FHE) and present an end-to-end implementation of FHE in our IBM FL library. Moreover, we will conduct a comprehensive comparison of the possible communication and computation costs for deploying FHE in FL. To conclude, we will summarize the existing challenges and opportunities in securing FL.

Join Virtual Demonstration Visit IBM Booth

Software-Delivered AI: Using Sparse-Quantization for Fastest Inference on Deep Neural Networks

Mon 28 Nov 8 a.m. - 10 a.m. PST
Expo Demonstration

Today’s state of deep neural network inference can be summed up in two words: complex and inefficient. The quest for accuracy has led to overparameterized deep neural networks that require heavy compute resources to solve tasks at hand, and as such we are rapidly approaching unsustainable computational, economic, and environmental costs to gain incrementally smaller improvements in model performance. Enter sparsity, a research technique that makes neural networks smaller and faster. There is no lack of research on achieving high levels of network sparsity, but putting that research into practice remains a challenge. As a result, data scientists and machine learning engineers are often forced to make tradeoffs between model performance, accuracy, and inference costs. After years of research at MIT, the team at Neural Magic concluded that throwing teraflops at dense models is not sustainable. So they’ve taken the best of known research on model compression (unstructured pruning and quantization, in particular) and efficient sparse execution to build a software solution that delivers efficient deep neural network inference on everyday CPUs, without the need for specialized hardware. Join Neural Magic ML experts to learn how they successfully created and applied SOTA research on model compression and efficient sparse execution to build open-source software that compresses and optimizes deep learning models for efficient inference, ultimately deploying it with DeepSparse, a free-to-use engine that delivers GPU speeds on commodity CPUs. The community will walk away with an overview of (1) SOTA research and model compression techniques, including ways to apply them to your models using open-source software, (2) a demo of the first-ever sparsity-aware inference engine that translates high sparsity levels into a significant speedup, and (3) next steps on using the Neural Magic open-source and free ML tools to make your inference efficient and less complex.

Join Virtual Demonstration Visit Neural Magic Booth

Real-time Navigation of Chemical Space with Cloud-Based Inference from MoLFormer

Mon 28 Nov 8 a.m. - 10 a.m. PST
Expo Demonstration

We present a large chemical language model MoLFormer, which is an efficient transformer encoder model of chemical SMILES and uses rotary positional embeddings. This model employs a linear attention mechanism, coupled with highly distributed self-supervised training, on SMILES sequences of 1.1 billion molecules from the PubChem and ZINC datasets. Experiments show the generality of this molecular representation based on its performance on several molecular property classification and regression tasks. Further analyses, specifically through the lens of attention, demonstrate emergence of spatial relationships between atoms within MoLFormer trained on chemical SMILES. We further present a cloud-based real-time platform that allows users to virtually navigate the chemical space and screen molecules of interest. The platform leverages molecular embedding inferred from MoLFormer and retrieves nearest neighbors and their metadata for an input chemical. Based on the functionalities of this platform and results obtained, we believe that such a platform adds value in automating chemistry and assists in drug discovery and material design tasks.

Join Virtual Demonstration Visit IBM Booth

Full-Stack 3D Scene Understanding on an Extended Reality Headset

Mon 28 Nov 8 a.m. - 10 a.m. PST
Expo Demonstration

"3D understanding of a scene is of fundamental importance to Extended Reality (XR), which includes both Augmented Reality (AR) and Virtual Reality (VR). In addition, being able to efficiently run all the needed 3D algorithms on a resource-constrained XR device is critical for delivering a satisfactory XR experience.

In this demo, we showcase a full-stack approach, where we efficiently deploy 3D understanding on a XR headset by optimizing across the AI algorithms, software, and Snapdragon® hardware. In particular, we first adopt a self-supervised learning strategy to train a monocular depth estimation network on unlabeled video sequences previously captured on the headset and utilize information from the 6 degrees-of-freedom (6DoF) camera tracking algorithm to provide scale-correct training and inference. Then, the trained depth network is quantized and deployed onto the device using the Qualcomm® Neural Processing SDK. Given our accurate, low-latency depth estimation and 6DoF pose estimation, we perform 3D reconstruction of the scene as well as plane estimation in real time on the XR headset."

Join Virtual Demonstration Visit Qualcomm Booth

Human Modeling and Strategic Reasoning in the Game of Diplomacy

Mon 28 Nov 8 a.m. - 10 a.m. PST
Expo Demonstration

Games have long been a proving ground for new AI advancements — from Deep Blue’s victory over chess grandmaster Garry Kasparov, to AlphaGo’s mastery of Go, to Pluribus out-bluffing the best humans in poker. But truly useful, versatile agents need to be able to cooperate with humans rather than simply compete against them. In this session, we will demo our latest research on the game of Diplomacy, a major benchmark for cooperative AI that requires modeling human players and understanding human norms and expectations. We will dive into the technology, show actual game play, and key members from our research team will be on hand to answer questions about the research and discuss the future of where this technology may go.

Join Virtual Demonstration Visit Meta Platforms, Inc. Booth

Efficient super-resolution using 4-bit integer quantization for real-time mobile applications

Mon 28 Nov 8 a.m. - 10 a.m. PST
Expo Demonstration

The ever-improving display capabilities of TVs, smartphones, or VR headsets foster the need for efficient upscaling solutions. While DL-based solutions usually obtain impressive results in terms of visual quality, they are often slow and not suited for real-time applications on mobile platforms. In this demonstration, we showcase Q-SRNet, our efficient single-image super-resolution architecture, which provides better accuracy-to-latency tradeoffs than existing neural architectures. We apply our architecture to gaming and show that our solution outperforms existing non-ML based approaches.

To further optimize on-device performance, we leverage the AI Model Efficiency Toolkit (AIMET)’s latest advances in low-bit quantization and obtain excellent accuracy with 4-bit quantization (W4A8). Q-SRNet produces 4k images at 4x upscaling and 200+ FPS on a Qualcomm® Reference Design phone powered by Snapdragon® 8 Gen 2 Mobile Platform.

Join Virtual Demonstration Visit Qualcomm Booth

SPONSOR EXPO | Nov 28th

Welcome To The NEURIPS Sponsor Expo!

Expo Schedule

TALKS & PANELS

TIPDAT: ML based Optimization Explainer

End-to-end cloud-based Document Intelligence Architecture using the open-source Feathr Feature Store, the SynapseML Spark library, and Hugging Face Extractive Question Answering

Challenges & Opportunities for Ethical AI in Practice

Machine Learning and Optimization for Automated Trading at HRT

Uncertainty quantification for fair and transparent AI-assisted decision-making

Using AI for Reduced-Order Modeling

Understanding the Landscape of the latest Large Models - Virtual

Sparse annotation strategies at scale

Towards learning agents for solving complex real-world tasks

Generative Understanding of 3D Scenes

Adapt and Optimize ML Models for Hardware-Aware AI

Human-in-the-Loop Is Here to Stay

Integrating modern machine learning and single cell technologies into drug target discovery - lessons from the frontline.

Workshops

Get ready, your Jupyter Notebook goes to production!

Graph Neural Networks in Tensorflow: A Practical Guide

Fine-tuning stable diffusion models: massive creativity without massive bills

PyTorch: New advances for large-scale training and performance optimizations

Impactful graph neural networks via DGL: A Tale of Research and Productization

Intro to TensorFlow and JAX

AutoGluon: Empowering (MultiModal) AutoML for the next 10 Million users

PyTorch: New advances for large-scale training and performance optimizations

DGL: Impactful graph neural networks: A Tale of Research and Productionization

Demonstrations

Build Better Models Faster with W&B

Conditional Compute for On-device Video Understanding

Practical Deployment of Secure Federated Learning: Challenges, Opportunities and Solutions

Software-Delivered AI: Using Sparse-Quantization for Fastest Inference on Deep Neural Networks

Real-time Navigation of Chemical Space with Cloud-Based Inference from MoLFormer

Full-Stack 3D Scene Understanding on an Extended Reality Headset

Human Modeling and Strategic Reasoning in the Game of Diplomacy

Efficient super-resolution using 4-bit integer quantization for real-time mobile applications