Timezone: »

ML For Systems
Benoit Steiner · Jonathan Raiman · Martin Maas · Azade Nazi · Mimee Xu · Anna Goldie

Mon Dec 13 09:00 AM -- 05:55 PM (PST) @ None
Event URL: http://mlforsystems.org »

ML for Systems is an emerging research area that has shown promising results in the past few years. Recent work has shown that ML can be used to replace heuristics, solve complex optimization problems, and improve modeling and forecasting when applied in the context of computer systems.

As an emerging area, ML for Systems is still in the process of defining the common problems, frameworks and approaches to solving its problems, which requires venues that bring together researchers and practitioners from both the systems and machine learning communities. Past iterations of the workshops focused on providing such a venue and broke new ground on a broad range of emerging new directions in ML for Systems. We want to carry this momentum forward by encouraging the community to explore areas that have previously received less attention. Specifically, the workshop commits to highlighting works that also optimize for security and privacy, as opposed to metrics like speed and memory and use ML to optimize for energy usage and carbon impact. Additionally, this year we will encourage the development of shared methodology, tools, and frameworks.

For the first time since the inception of the workshop, we will organize a competition. This competition will showcase important systems problems, and challenges the ML community to test their methods and algorithms on these problems. Our competition tasks are designed to have a low barrier of entry that attracts newcomers as well as systems veterans.

This setup will allow attendees to meet with top researchers and domain experts, old and new, bridging cutting edge ML research with practical systems design. We hope that providing a prestigious venue for researchers from both fields to meet and interact will result in both fundamental ML research as well as real-world impact to computer systems design and implementation.

Mon 9:00 a.m. - 9:25 a.m.
Opening Remarks (Introduction)
Jonathan Raiman · Anna Goldie · Benoit Steiner · Azade Nazi · Martin Maas · Mimee Xu
Mon 9:30 a.m. - 10:05 a.m.


Tim Kraska
Mon 10:10 a.m. - 10:40 a.m.


Anima Anandkumar
Mon 10:45 a.m. - 11:20 a.m.


Michael Carbin
Mon 12:05 p.m. - 1:00 p.m.
Lunch Break (Break)
Mon 3:50 p.m. - 4:25 p.m.


Luis Ceze
Mon 4:30 p.m. - 5:00 p.m.

Search-based techniques have been demonstrated effective in solving complex optimization problems that arise in domain-specific compilers for machine learning (ML). Unfortunately, deploying such techniques in production compilers is impeded by several limitations. In this talk, I will present an autotuner for production ML compilers that can tune both graph-level and subgraph-level optimizations at multiple compilation stages. The autotuner applies a flexible search methodology that defines a search formulation for joint optimizations by accurately modeling the interactions between different compiler passes. The autotuner tunes tensor layouts, operator fusion decisions, tile sizes, and code generation parameters in XLA, a production ML compiler, using various search strategies. We demonstrate how to incorporate machine learning techniques such as a learned cost model and various learning-based search strategies to reduce autotuning time. Our learned cost model has high accuracy and outperforms a heavily-optimized analytical performance model. In an evaluation across 150 ML training and inference models on Tensor Processing Units (TPUs), the autotuner offers up to 2.4x and an average 5% runtime speedup over the heavily-optimized XLA compiler. The autotuner has been deployed to automatically tune the most heavily-used production models in Google’s fleet everyday.

Phitchaya Phothilimtha
Mon 5:00 p.m. - 5:30 p.m.

Leveraging machine learning for system optimization can relieve researchers of designing manual heuristics, a time-consuming procedure. In this talk, we mainly discuss data-driven iterative refinement that models optimization as a sequential decision process: an initial solution to the optimization problem is iteratively improved until convergence. Each refinement step is controlled by a ML model learned from previous optimization trials, or data collected so far in this trial. We then introduce two examples in ML system, Coda and N-Bref, that de-compile assembly codes back to its source code. In both cases, first a coarse source program is proposed, and then refined by learned models to match the assembly. These approaches show strong performance compared to existing de-compilation tools that rely upon human heuristics and domain knowledge.

Yuandong Tian
Mon 5:40 p.m. - 5:55 p.m.
Closing Remarks (Outro)
Jonathan Raiman · Mimee Xu · Martin Maas · Anna Goldie · Azade Nazi · Benoit Steiner

Network load balancers (LBs) are important components in data centers (DCs) to provide scalable services. Workload distribution algorithms are based on heuristics (ECMP, WCMP) or naive machine learning (ML) algorithms (ridge regression). Advanced ML-based approaches help achieve performance gain in different networking and system problems. However, it is challenging to apply ML algorithms on networking problems in real-life systems. It requires domain knowledge to collect features from low-latency, high-throughput, and scalable networking systems, which are dynamic and heterogenous. This paper proposes Aquarius to bridge the gap between ML and networking systems and demonstrates its usage in the context of network LBs. This paper demonstrates its ability of conducting both offline data analysis and online model deployment in realistic systems. The results show that the ML model trained and deployed using Aquarius improves load balancing performance yet they also reveals more challenges to be resolved to apply ML for networking systems.

Zhiyuan Yao · Thomas Heide Clausen

Instruction combiner (IC) is a critical compiler optimization pass, which replaces a sequence of instructions with an equivalent and optimized instruction sequence at basic block level. There can be thousands of instruction-combining patterns which need to be frequently updated as new coding styles/idioms/applications and novel hardware evolve over time. This makes the IC optimization pass error prone, incurring high maintenance cost. Prior work has shown that IC pass is the buggiest pass in the LLVM (Low Level Virtual Machine) compiler and the third most buggy pass in GCC (GNU Compiler Collection). To mitigate these challenges associated with the traditional IC, we design and implement a Neural Instruction Combiner {NIC}) and demonstrate its feasibility by integrating it into the standard LLVM compiler optimization pipeline. NIC leverages neural Seq2Seq model techniques for generating optimized encoded IR sequence from the unoptimized encoded IR sequence. We show that NIC achieves exact match results percentage of 72% for optimized sequences as compared to traditional IC, demonstrating its feasibility in a production compiler pipeline.

sandya mannarswamy · Dibyendu Das

In standard generative deep learning models, such as autoencoders or GANs, the size of the parameter set is proportional to the complexity of the generated data distribution. A significant challenge is to deploy resource-hungry deep learning models in devices with limited memory to prevent system upgrade costs. To combat this, we propose a novel framework called generative optimization networks (GON) that is similar to GANs, but does not use a generator, significantly reducing its memory footprint. GONs use a single discriminator network and run optimization in the input space to generate new data samples, achieving an effective compromise between training time and memory consumption. GONs are most suited for data generation problems in limited memory settings. Here we illustrate their use for the problem of anomaly detection in memory-constrained edge devices arising from attacks or intrusion events. Specifically, we use a GON to calculate a reconstruction-based anomaly score for input time-series windows. Experiments on a Raspberry-Pi testbed with two existing and a new suite of datasets show that our framework gives up to 32% higher detection F1 scores and 58% lower memory consumption, with only 5% higher training overheads compared to the state-of-the-art.

Shreshth Tuli · Shikhar Tuli · Giuliano Casale · Nicholas Jennings

Although machine learning (ML) has been successful in automating various software engineering needs, software testing still remains a highly challenging topic. In this paper, we aim to improve the generative testing of software by directly augmenting the random number generator (RNG) with a deep reinforcement learning (RL) agent using an efficient, automatically extractable state representation of the software under test. Using the Cosmos SDK as the testbed, we show that the proposed DeepRNG framework provides a statistically significant improvement to the testing of the highly complex software library with over 350,000 lines of code. The source code of the DeepRNG framework is publicly available online.

Chuan-Yung Tsai · Graham Taylor

Machine Learning has been successfully applied in systems applications such as memory prefetching and caching, where learned models have been shown to outperform heuristics. However, the lack of understanding the inner workings of these models -- interpretability -- remains a major obstacle for adoption in real-world deployments. Understanding a model's behavior can help system administrators and developers gain confidence in the model, understand risks, and debug unexpected behavior in production. Interpretability for models used in computer systems poses a particular challenge: Unlike ML models trained on images or text, the input domain (e.g., memory access patterns, program counters) is not immediately interpretable. A major challenge is therefore to explain the model in terms of concepts that are approachable to a human practitioner. By analyzing a state-of-the-art caching model, we provide evidence that the model has learned concepts beyond simple statistics that can be leveraged for explanations. Our work provides a first step towards understanding ML models in systems and highlights both promises and challenges of this emerging research area.

Leon Sixt · Evan Liu · Marie Pellat · James Wexler · Milad Hashemi · Been Kim · Martin Maas

The rapid rise in demand for training large neural network architectures has brought into focus the need for partitioning strategies, for example by using data, model, or pipeline parallelism. Implementing these methods is increasingly supported through program primitives, but identifying efficient partitioning strategies requires expensive experimentation and expertise. We present the prototype of an automated partitioner that seamlessly integrates into existing compilers and existing user workflows. Our partitioner enables SPMD-style parallelism that encompasses data parallelism and parameter/activation sharding. Through a combination of inductive tactics and search in a platform-independent partitioning IR, automap can recover expert partitioning strategies such as Megatron sharding for transformer layers.

Michael Schaarschmidt · Adam Paszke
Resource-disaggregated data centres (RDDC) propose a resource-centric, and high-utilisation architecture for data centres (DC), avoiding resource fragmentation and enabling arbitrarily sized resource pools to be allocated to tasks, rather than server-sized ones. RDDCs typically impose greater demand on the network, requiring more infrastructure and increasing cost and power, so new resource allocation algorithms that co-manage both server and networks resources are essential to ensure that allocation is not bottlenecked by the network, and that requests can be served successfully with minimal networking resources. We apply reinforcement learning (RL) to this problem for the first time and show that an RL policy based on graph neural networks can learn resource allocation policies end-to-end that outperform previous hand-engineered heuristics by up to 22.0\%, 42.6\% and 22.6\% for acceptance ratio, CPU and memory utilisation respectively, maintain performance when scaled up to RDDC topologies with $10^2\times$ more nodes than those seen during training and can achieve comparable performance to the best baselines while using $5.3\times$ less network resources.
Zacharaya Shabka · Georgios Zervas

Network load balancers (LBs) are one of the key components in data centers (DCs). They distribute workloads across multiple servers and help offer scalable services. However, operating in dynamic network environments with limited observations, modern LBs rely on heuristic algorithms and require manual configurations for fairness optimization. As reinforcement learning (RL) helps achieve performance gains in dynamic systems, this paper proposes a distributed asynchronous RL mechanism to improve LBs’ workload distribution fairness with limited observations. The performance of proposed mechanism is evaluated and compared with state-of-the-art LB algorithms in a simulator, under configurations with progressively increasing difficulties. Preliminary results show promise in RL-based LB algorithms, and cast light on more challenges for future research, including reward function design and model scalability.

Zhiyuan Yao · Zihan Ding · Thomas Heide Clausen

Interest in applying Reinforcement Learning (RL) techniques to compiler optimizations is increasing rapidly, but compiler research has a high entry barrier. Unlike in other domains, compiler and RL researchers do not have access to the infrastructure and datasets that enable fast iteration and development of ideas, and getting started requires a significant engineering investment.

We present CompilerGym, a community infrastructure for exposing compiler optimizations as RL environments, and initial results in applying RL to these environments. Our findings suggest two key challenges in RL for compilers is representation learning and transfer learning between program domains.

Chris Cummins · Bram Wasti · Brandon Cui · Olivier Teytaud · Benoit Steiner · Yuandong Tian · Hugh Leather

Mixed-precision quantization is a powerful tool to enable memory and compute savings of neural network workloads by deploying different sets of bit-width precisions on separate compute operations. Recent research has shown significant progress in applying mixed-precision quantization techniques to reduce the memory footprint of various workloads, while also preserving task performance. Prior work, however, has often ignored additional objectives, such as bit-operations, that are important for deployment of workloads on hardware. Here we present a flexible and scalable framework for automated mixed-precision quantization that optimizes multiple objectives. Our framework relies on Neuroevolution-Enhanced Multi-Objective Optimization (NEMO), a novel search method, to find Pareto optimal mixed-precision configurations for memory and bit-operations objectives. Within NEMO, a population is divided into structurally distinct sub-populations (species) which jointly form the Pareto frontier of solutions for the multi-objective problem. At each generation, species are re-sized in proportion to the goodness of their contribution to the Pareto frontier. This allows NEMO to leverage established search techniques and neuroevolution methods to continually improve the goodness of the Pareto frontier. In our experiments we apply a graph-based representation to describe the underlying workload, enabling us to deploy graph neural networks trained by NEMO to find Pareto optimal configurations for various workloads trained on ImageNet. Compared to the state-of-the-art, we achieve competitive results on memory compression and superior results for compute compression for MobileNet-V2, ResNet50 and ResNeXt-101-32x8d, one of the largest ImageNet models amounting to a search space of ~10**146. A deeper analysis of the results obtained by NEMO also shows that both the graph representation and the species-based approach are critical in finding effective configurations for all workloads.

Santiago Miret · Vui Seng Chua · Mattias Marder · Mariano Phielipp · Nilesh Jain · Somdeb Majumdar

Representing DNNs with low-precision numbers is a promising approach that enables the efficient acceleration of large-scale deep neural networks (DNNs). However, previous methods typically keep a copy of weights in high precision for weight updates during training. Directly training over low-precision weights still remains an unsolved problem because of the complex interactions between low-precision number systems and the underlying learning algorithms. To address this problem, we develop a low-precision training framework, termed LNS-Madam, in which we jointly design a logarithmic number system (LNS) and a multiplicative weight update training method (Madam). LNS-Madam yields low quantization error during weight update, leading to a stable convergence even if the precision is limited. By replacing SGD or Adam with the Madam optimizer, training under LNS requires less weight precision during the updates while preserving the state-of-the-art prediction accuracy.

Jiawei Zhao · Steve Dai · Rangha Venkatesan · Brian Zimmer · Mustafa Ali · Ming-Yu Liu · Brucek Khailany · · Anima Anandkumar

To attain higher efficiency, the industry has gradually reformed towards application-specific hardware accelerators. While such a paradigm shift is already starting to show promising results, designers need to spend considerable manual effort and perform large number of time-consuming simulations to find accelerators that can accelerate multiple target applications while obeying design constraints. Moreover, such a simulation-driven approach must be re-run from scratch every time the set of target applications or design constraints change. An alternative paradigm is to use a data-driven, offline approach that utilizes logged simulation data, to architect hardware accelerators, without needing any form of simulations. Such an approach not only alleviates the need to run time-consuming simulation, but also enables data reuse and applies even when set of target applications changes. In this paper, we develop such a data-driven offline optimization method for designing hardware accelerators, dubbed PRIME, that enjoys all of these properties. Our approach learns a conservative, robust estimate of the desired cost function, utilizes infeasible points and optimizes the design against this estimate without any additional simulator queries during optimization. PRIME architects accelerators---tailored towards both single- and multi-applications---improving performance upon stat-of-the-art simulation-driven methods by about 1.54x and 1.20x, while considerably reducing the required total simulation time by 93% and 99%, respectively. In addition, PRIME also architects effective accelerators for unseen applications in a zero-shot setting, outperforming simulation-based methods by 1.26x.

Aviral Kumar · Milad Hashemi · Kevin Swersky · Amir Yazdanbakhsh · Sergey Levine

The advancement of deep learning has led to the development of neural decoders for low latency communications. However, neural decoders can be very complex which can lead to increased computation and latency. We consider iterative pruning approaches (such as the lottery ticket hypothesis algorithm) to prune weights in neural decoders. Decoders with fewer number of weights can have lower latency and lower complexity while retaining the accuracy of the original model. This will make neural decoders more suitable for mobile and other edge devices with limited computational power. We also propose semi-soft decision decoding for neural decoders which can be used to improve the bit error rate performance of the pruned network.

Vikrant Malik · Rohan Ghosh · Mehul Motani

Author Information

Benoit Steiner (Facebook AI Research)
Jonathan Raiman (NVIDIA)
Martin Maas (Google)
Azade Nazi (Google Brain)
Mimee Xu (NYU)

I apply security and privacy to machine learning.

Anna Goldie (Google Brain / Stanford)

More from the Same Authors

  • 2020 : Data Appraisal Without Data Sharing »
    Mimee Xu
  • 2021 : Interpretability of Machine Learning in Computer Systems: Analyzing a Caching Model »
    Leon Sixt · Evan Liu · Marie Pellat · James Wexler · Milad Hashemi · Been Kim · Martin Maas
  • 2021 : Closing Remarks »
    Jonathan Raiman · Mimee Xu · Martin Maas · Anna Goldie · Azade Nazi · Benoit Steiner
  • 2021 : Opening Remarks »
    Jonathan Raiman · Anna Goldie · Benoit Steiner · Azade Nazi · Martin Maas · Mimee Xu
  • 2021 Poster: Learning Space Partitions for Path Planning »
    Kevin Yang · Tianjun Zhang · Chris Cummins · Brandon Cui · Benoit Steiner · Linnan Wang · Joseph Gonzalez · Dan Klein · Yuandong Tian
  • 2020 Workshop: Machine Learning for Systems »
    Anna Goldie · Azalia Mirhoseini · Jonathan Raiman · Martin Maas · Xinlei XU
  • 2020 Poster: Transferable Graph Optimizers for ML Compilers »
    Yanqi Zhou · Sudip Roy · Amirali Abdolrashidi · Daniel Wong · Peter Ma · Qiumin Xu · Hanxiao Liu · Phitchaya Phothilimtha · Shen Wang · Anna Goldie · Azalia Mirhoseini · James Laudon
  • 2020 Oral: Transferable Graph Optimizers for ML Compilers »
    Yanqi Zhou · Sudip Roy · Amirali Abdolrashidi · Daniel Wong · Peter Ma · Qiumin Xu · Hanxiao Liu · Phitchaya Phothilimtha · Shen Wang · Anna Goldie · Azalia Mirhoseini · James Laudon
  • 2019 : Coffee Break & Poster Session 1 »
    Yan Zhang · Jonathon Hare · Adam Prugel-Bennett · Po Leung · Patrick Flaherty · Pitchaya Wiratchotisatian · Alessandro Epasto · Silvio Lattanzi · Sergei Vassilvitskii · Morteza Zadimoghaddam · Theja Tulabandhula · Fabian Fuchs · Adam Kosiorek · Ingmar Posner · William Hang · Anna Goldie · Sujith Ravi · Azalia Mirhoseini · Yuwen Xiong · Mengye Ren · Renjie Liao · Raquel Urtasun · Haici Zhang · Michele Borassi · Shengda Luo · Andrew Trapp · Geoffroy Dubourg-Felonneau · Yasmeen Kussad · Christopher Bender · Manzil Zaheer · Junier Oliva · Michał Stypułkowski · Maciej Zieba · Austin Dill · Chun-Liang Li · Songwei Ge · Eunsu Kang · Oiwi Parker Jones · Kelvin Ka Wing Wong · Joshua Payne · Yang Li · Azade Nazi · Erkut Erdem · Aykut Erdem · Kevin O'Connor · Juan J Garcia · Maciej Zamorski · Jan Chorowski · Deeksha Sinha · Harry Clifford · John W Cassidy
  • 2019 Workshop: ML For Systems »
    Milad Hashemi · Azalia Mirhoseini · Anna Goldie · Kevin Swersky · Xinlei XU · Jonathan Raiman · Jonathan Raiman
  • 2019 : Poster Spotlights B (13 posters) »
    Alberto Camacho · Chris Percy · Vaishak Belle · Beliz Gunel · Toryn Klassen · Tillman Weyde · Mohamed Ghalwash · Siddhant Arora · León Illanes · Jonathan Raiman · Qing Wang · Alexander Lew · So Yeon Min
  • 2019 : Posters and Coffee »
    Sameer Kumar · Tomasz Kornuta · Oleg Bakhteev · Hui Guan · Xiaomeng Dong · Minsik Cho · Soeren Laue · Theodoros Vasiloudis · Andreea Anghel · Erik Wijmans · Zeyuan Shang · Oleksii Kuchaiev · Ji Lin · Susan Zhang · Ligeng Zhu · Beidi Chen · Vinu Joseph · Jialin Ding · Jonathan Raiman · Ahnjae Shin · Vithursan Thangarasa · Anush Sankaran · Akhil Mathur · Martino Dazzi · Markus Löning · Darryl Ho · Emanuel Zgraggen · Supun Nakandala · Tomasz Kornuta · Margaita Kuznetsova
  • 2019 Poster: PyTorch: An Imperative Style, High-Performance Deep Learning Library »
    Adam Paszke · Sam Gross · Francisco Massa · Adam Lerer · James Bradbury · Gregory Chanan · Trevor Killeen · Zeming Lin · Natalia Gimelshein · Luca Antiga · Alban Desmaison · Andreas Kopf · Edward Yang · Zachary DeVito · Martin Raison · Alykhan Tejani · Sasank Chilamkurthy · Benoit Steiner · Lu Fang · Junjie Bai · Soumith Chintala
  • 2018 Workshop: Machine Learning for Systems »
    Anna Goldie · Azalia Mirhoseini · Jonathan Raiman · Kevin Swersky · Milad Hashemi
  • 2018 : Demo session »
    Sonam Damani · Philip Dow · Yuki Izumi · Shishir Patil · Isabelle Leang · Mimee Xu · wenhan zhang
  • 2018 : Spotlight (poster, demo), Lunch & Poster Session »
    Brijraj Singh · Philip Dow · Robert Dürichen · Paul Whatmough · Chen Feng · Arijit Patra · Shishir Patil · Eunjeong Jeong · Zhongqiu Lin · Yuki Izumi · Isabelle Leang · Mimee Xu · wenhan zhang · Sam Witteveen