Workshop
Machine Learning for Systems
Xinlei XU · Dan Zhang · Mangpo Phothilimthana · Beidi Chen · Yawen Wang · Divya Mahajan
Room 211 - 213
Machine Learning (ML) for Systems describes the application of machine learning techniques to problems related to computer systems. By leveraging supervised learning and reinforcement learning (RL) approaches, machine learning can replace longstanding heuristics that currently drive many of these systems. This includes a wide range of topics, including multi-objective tasks such as designing new data structures, integrated circuits, or design verification, as well as implementing control algorithms for applications such as compilers, databases, memory management, or ML frameworks. While the systems community increasingly recognizes the importance of ML in solving a variety of different systems problems, ML for Systems remains an emerging area without widely established best practices, methods and strategies for the application of state-of-the-art machine learning techniques. The goal of this workshop is to provide an interdisciplinary venue for ML and Systems experts to push this boundary and start new directions within the ML for Systems area.
Schedule
Sat 7:00 a.m. - 7:05 a.m.
|
Opening Remarks
(
Talk
)
>
SlidesLive Video |
🔗 |
Sat 7:05 a.m. - 7:55 a.m.
|
Bill Dally - Invited Talk
(
Talk
)
>
SlidesLive Video Dr. Bill Dally brings extensive expertise in circuit design, high performance computing, and machine learning. Dally is the SVP of Research at NVIDIA, which he joined as a chief scientist in 2009. Previously, Dally led research teams as a professor at Stanford for 12 years and MIT for 11 years. He co-founded two companies. He is a member of the National Academy of Engineering, a Fellow of the American Academy of Arts & Sciences, a Fellow of the IEEE and the ACM, and has received the ACM Eckert-Mauchly Award, the IEEE Seymour Cray Award, and the ACM Maurice Wilkes award. He has published over 250 papers, holds over 120 issued patents, and is an author of four textbooks. |
🔗 |
Sat 8:00 a.m. - 8:55 a.m.
|
Chris Lattner - Invited Talk
(
Keynote Talk
)
>
SlidesLive Video Programming Languages Challenges in Large Scale Machine Learning Chris Lattner is the co-founder and CEO of Modular AI. He cofounded the LLVM Compiler infrastructure, the Clang compiler, the Swift programming language, the MLIR compiler infrastructure, the CIRCT project (applying MLIR to hardware design), and have contributed to many other commercial and open source projects at Apple, Tesla, Google, and SiFive. Previously, Chris led the Engineering and Product teams at SiFive, the Google TensorFlow team, the Tesla Autopilot team, and worked for Apple managing the Developer Tools department. |
🔗 |
Sat 8:55 a.m. - 9:10 a.m.
|
Coffee Break!
(
Break
)
>
|
🔗 |
Sat 9:10 a.m. - 9:20 a.m.
|
Yunkai Zhang - VMR2L: Virtual Machines Rescheduling Using Reinforcement Learning in Data Centers
(
AM Spotlight Talk
)
>
SlidesLive Video Modern industry-scale data centers receive thousands of virtual machine (VM) requests per minute. Due to the continual creation and release of VMs, many small resource fragments are scattered across physical machines (PMs). To handle these fragments, data centers periodically reschedule some VMs to alternative PMs. Despite the increasing importance of VM rescheduling as data centers grow in size, the problem remains understudied. We first show that, unlike most combinatorial optimization tasks, the inference time of VM rescheduling algorithms significantly influences their performance, causing many existing methods to scale poorly. Therefore, we develop a reinforcement learning system for VM rescheduling, VMR2L, which incorporates a set of customized techniques, such as a two-stage framework that accommodates diverse constraints and workload conditions as well as an effective feature extraction module. Our experiments on an industry-scale data center show that VMR2L can achieve a performance comparable to the optimal solution, but with a running time of seconds. |
Yunkai Zhang 🔗 |
Sat 9:20 a.m. - 9:30 a.m.
|
Guanhua Wang - ZeRO++: Extremely Efficient Collective Communication for Large Model Training
(
Talk
)
>
link
SlidesLive Video While the Zero Redundancy Optimizer (ZeRO) excels in training large-scale models, it struggles to achieve good throughput in environments with limited bandwidth or small batches where communication becomes a major bottleneck. Inspired by the principles of fine-grained quantization in machine learning algorithms, we designed ZeRO++, an optimizer robust to quantization effects that allows for significant communication volume reduction using low-precision quantization techniques. ZeRO++ composes of three communication volume reduction techniques (low-precision all-gather, data remapping, and low-precision gradient averaging) to significantly reduce the communication volume up to 4x that enables up to 2.16x better throughput at 384 GPU scale. Our results also show ZeRO++ can speedup the RLHF by 3.3x compared to vanilla ZeRO. To verify the convergence of ZeRO++, we test up to 13B model for pretraining with 8/6-bits all gather and up to 30B model for finetuning with 4-bit or 2-bit all gather, and demonstrate on-par accuracy as original ZeRO (aka standard training). As a byproduct, the model trained with ZeRO++ is naturally weight-quantized, which can be directly used for inference without post-training quantization or quantization-aware training. |
Guanhua Wang 🔗 |
Sat 9:40 a.m. - 10:10 a.m.
|
Poster Session
(
Poster
)
>
Free form time after 12:10 |
🔗 |
Sat 10:10 a.m. - 11:30 a.m.
|
Lunch!
(
Lunch
)
>
|
🔗 |
Sat 11:30 a.m. - 12:10 p.m.
|
Atlas Wang - Invited Talk
(
Talk
)
>
SlidesLive Video Efficient Generative Inference with Heavy Hitters and Beyond Professor Zhangyang "Atlas" Wang is an associate professor at UT Austin, where he holds the Temple Foundation Endowed Faculty Fellowship #7, in the Chandra Family Department of Electrical and Computer Engineering. He is also a faculty member of UT Computer Science (GSC) and the Oden Institute CSEM program. Wang directs AI Research and Technology at Picsart part time. Wang is a recipient of numerous awards, including NSF CAREER, IEEE AI's 10 To Watch, and Google Research Scholar awards. |
Zhangyang "Atlas" Wang 🔗 |
Sat 12:15 p.m. - 12:30 p.m.
|
Coffee Break!
(
Break
)
>
Authors hang posters |
🔗 |
Sat 12:30 p.m. - 12:40 p.m.
|
Haoran Qiu - On the Promise and Challenges of Foundation Models for Learning-based Cloud Systems Management
(
PM Spotlight Talk
)
>
SlidesLive Video Foundation models (FMs) are machine learning models that are trained broadly on large-scale data and can be adapted to a set of downstream tasks via fine-tuning, few-shot learning, or even zero-shot learning. Despite the successes of FMs in the language and vision domain, we have yet to see an attempt to develop FMs for cloud systems management (or known as cloud intelligence/AIOps). In this work, we explore the opportunities of developing FMs for cloud systems management. We propose an initial FM design (i.e., the FLASH framework) based on meta-learning and demonstrate its usage in the task of resource configuration search and workload autoscaling. Preliminary results show that FLASH achieves 52.3-90.5% less performance degradation with no adaptation and provides 5.5x faster adaptation. We conclude this paper by discussing the unique risks and challenges of developing FMs for cloud systems management. |
Haoran Qiu 🔗 |
Sat 12:40 p.m. - 12:50 p.m.
|
Saswat Padhi - Predicting User Experience on Laptops from Hardware Specifications
(
PM Spotlight Talk
)
>
SlidesLive Video Estimating the overall user experience (UX) on a device is a common challenge faced by manufacturers. Today, device makers primarily rely on microbenchmark scores, such as Geekbench, that stress test specific hardware components, such as CPU or RAM, but do not satisfactorily capture real-life consumer workloads. System designers often rely on domain-specific heuristics and extensive testing of prototypes to reach a desired UX goal, and yet there is often a mismatch between the manufacturers’ performance claims and the consumers’ experience. We present our initial results on predicting real-life experience on laptops from their hardware specifications. We target web applications that run on Chromebooks (ChromeOS laptops) for a simple and fair aggregation of experience across applications and workloads. On 54 laptops, we track 9 UX metrics on common end-user workloads: web browsing, video playback and audio / video calls. We focus on a subset of high-level metrics exposed by the Chrome browser, that are part of the Web Vitals initiative for measuring user experience on web applications. With a dataset of 100K UX data points, we train gradient boosted regression trees that predict the metric values from device specifications. Across our 9 metrics, we note a mean score (goodness-of-fit on our dataset) of 97.8% and a mean MAAPE (percentage error in prediction on unseen data) of 10.1%. |
Saswat Padhi 🔗 |
Sat 12:50 p.m. - 1:50 p.m.
|
PM Poster Session
(
Poster Session
)
>
|
🔗 |
Sat 2:00 p.m. - 2:15 p.m.
|
Competition Track Highlights
(
Talk
)
>
SlidesLive Video Mangpo introduces competition track. |
Mangpo Phothilimthana 🔗 |
Sat 2:15 p.m. - 2:22 p.m.
|
10th Place Winner
(
Talk
)
>
Presentation from the 10th place winner of the competition: https://www.kaggle.com/competitions/predict-ai-model-runtime |
Habib Bukhari · Maxim Shugaev 🔗 |
Sat 2:22 p.m. - 2:29 p.m.
|
5th Place Winner
(
Talk
)
>
Presentation from the 5th place winner of the competition: https://www.kaggle.com/competitions/predict-ai-model-runtime |
Kenshin Abe 🔗 |
Sat 2:29 p.m. - 2:36 p.m.
|
3rd Place Winner
(
Talk
)
>
Presentation from the 3rd place winner of the competition: https://www.kaggle.com/competitions/predict-ai-model-runtime |
Janis Fluri 🔗 |
Sat 2:36 p.m. - 2:43 p.m.
|
2nd Place Winner
(
Talk
)
>
Presentation from the 2nd place winner of the competition: https://www.kaggle.com/competitions/predict-ai-model-runtime |
Denis Divitsky · David Robert Pugh · Dmitrii Khizbullin 🔗 |
Sat 2:43 p.m. - 2:50 p.m.
|
1st Place Winner
(
Talk
)
>
Presentation from the 1st place winner of the competition: https://www.kaggle.com/competitions/predict-ai-model-runtime |
Matheus Pedroza Ferreira · Eduardo Rocha de Andrade 🔗 |
Sat 2:50 p.m. - 3:00 p.m.
|
Competition Session Q&A
(
Q&A
)
>
Competition session Q&A |
Habib Bukhari · Mangpo Phothilimthana · Maxim Shugaev · Kenshin Abe · Janis Fluri · Denis Divitsky · David Robert Pugh · Dmitrii Khizbullin · Matheus Pedroza Ferreira · Eduardo Rocha de Andrade 🔗 |
-
|
SmartChoices: Augmenting Software with Learned Implementations
(
Poster
)
>
link
We are living in a golden age of machine learning. Powerful models perform many tasks far better than is possible using traditional software engineering approaches alone. However, developing and deploying these models in existing software systems remains challenging.In this paper, we present SmartChoices, a novel approach to incorporating machine learning into mature software stacks easily, safely, and effectively.We highlight key design decisions and present case studies applying SmartChoices within a range of large-scale industrial systems. |
Daniel Golovin · Gábor BartĂłk · Eric Chen · Emily Donahue · Tzu-Kuo Huang · Effrosyni Kokiopoulou · Ruoyan Qin · Nikhil Sarda · Justin Sybrandt · Vincent Tjeng 🔗 |
-
|
Choice-Based Learning in JAX
(
Poster
)
>
link
Choice-based learning is a programming paradigm for expressing learning system in terms of choices and losses. We explore a practical implementation of choice-based learning in JAX by combining two techniques in a novel way: algebraic effects and the selection monad. We describe the design and implementation of our library, explore its usefulness for real-world applications like hyperparameter tuning and deep reinforcement learning, and compare it with existing approaches. |
Daniel Zheng · Shangyin Tan · Gordon Plotkin · Ningning Xie 🔗 |
-
|
Early notice: GenAI-based Datarace Fix for Real-World Golang Programs
(
Poster
)
>
link
Data race detection has been a subject of extensive research for decades; the practical deployment of race detectors has also become increasingly commonplace in industrial settings. However, the focus has mainly been on the detection aspect, with relatively little attention directed toward the challenging task of autonomously repairing programs with data races. This discrepancy is understandable given the inherent complexities of fixing the data race and the substantial engineering efforts required to integrate fixes into existing workflows.In this paper, we introduce a novel closed-loop application that harnesses the power of Generative AI to fix data races automatically. Our early experiments involving this application within Uranus's internal codebase have yielded promising results. The evaluation results suggest a bright future for integrating this application into Uranus's infrastructure, potentially revolutionizing how data races are handled in large-scale software development environments. |
Feiyang Jin · Zhizhou Zhang · Rajkishore Barik · Gautam Korlam · Milind Chabbi 🔗 |
-
|
Learning Bit Allocations for Z-Order Layouts in Analytic Data Systems
(
Poster
)
>
link
To improve the performance of scanning and filtering, modern analytic data systems such as Amazon Redshift and Databricks Delta Lake give users the ability to sort a table using a Z-order, which maps each row to a "Z-value" by interleaving the binary representations of the row's attributes, then sorts rows by their Z-values. These Z-order layouts essentially sort the table by multiple columns simultaneously and can achieve superior performance to single-column sort orders when the user's queries filter over multiple columns. However, the Z-orders currently used by modern systems treat all columns as equally important, which often does not result in the best performance due to the unequal impact that different columns have on query performance. In this work, we investigate the performance impact of using Z-orders that place unequal importance on columns: instead of using an equal number of bits from each column in the Z-value interleaving, we allow unequal bit allocation. We introduce a technique that automatically learns the best bit allocation for a Z-order layout on a given dataset and query workload. Z-order layouts using our learned bit allocations outperform traditional Z-order layouts by up to 1.6X in query runtime and up to 2X in rows scanned. |
Jenny Gao · Jialin Ding · SIVAPRASAD SUDHIR · Samuel Madden 🔗 |
-
|
Learning Distributed Protocols with Zero Knowledge
(
Poster
)
>
link
The success of AlphaGo Zero shows that a computer can learn to play a complicated board game without relying on the knowledge from human players. We observe that designing a distributed protocol is similar to playing board games to some extent: when determining the next action to take, they both want to ensure they can win even when a smart opponent tries to drive the game/protocol to the worst case. In this work, we explore whether we can apply similar techniques to learn a distributed protocol with zero knowledge. Towards this goal, we model the process in a distributed protocol as a state machine, and further rely on model checking to validate the correctness of the learned state machine. With this approach, we successfully learned a correct atomic commit protocol with three processes, and upon that, we further discuss future work. |
Yujie Hui · Drew Ripberger · Xiaoyi Lu · Yang Wang 🔗 |
-
|
ComPile: A Large IR Dataset from Production Sources
(
Poster
)
>
link
Code is increasingly becoming a core data modality of modern machine learning research impacting not only the way we write codewith conversational agents like OpenAI's ChatGPT, Google's Bard, or Anthropic's Claude, the way we translate code from one languageinto another, but also the compiler infrastructure underlying the language. While modeling approaches may vary and representations differ,the targeted tasks often remain the same within the individual classes of models. Relying solely on the ability of modern models to extractinformation from unstructured code does not take advantage of 70 years of programming language and compiler development by not utilizing thestructure inherent to programs in the data collection. This detracts from the performance of models working over a tokenized representationof input code and precludes the use of these models in the compiler itself. To work towards the first intermediaterepresentation (IR) based models, we fully utilize the LLVM compiler infrastructure, shared by a number of languages, to generatea 182B token dataset of LLVM IR. We generated this dataset from programming languages built on the shared LLVMinfrastructure, including Rust, Swift, Julia, and C/C++, by hooking into LLVM code generation either through the language's packagemanager or the compiler directly to extract the dataset of intermediate representations from production grade programs. Our dataset shows great promise for large language model training, and machine-learned compiler components. |
Aiden Grossman · Ludger Paehler · Konstantinos Parasyris · Tal Ben-Nun · Jacob Hegna · William Moses · Mircea Trofin · Johannes Doerfert 🔗 |
-
|
PLPilot: Benchmark an Automated Programming Language Design Framework Enabled by Large Language Models
(
Poster
)
>
link
The design of new programming languages traditionally requires expertise across syntax and semantics. Recently, large language models(LLMs) have provided unprecedented power in the code generation field, which has the potential to revolutionize the current programming language design stack, including automating writing passes and formally defining a programming language's semantics and syntax. However, there is yet no framework to leverage LLMs to support programming language design. We propose an programming language design framework enabled by large language models, which decouples every part in the programming language design process into a form acceptable by LLMs. We then propose a set of benchmarks on LLM-based programming language tasks. We evaluate this framework on eight decoupled programming language design stages, which shows great productivity improvements over manually designed languages. |
Kaiyan Chang · kubn wang · Mengdi Wang · shengwen Liang · Yinhe Han · Huawei Li · Xiaowei Li · ying wang 🔗 |
-
|
Predicting User Experience on Laptops from Hardware Specifications
(
Poster
)
>
link
Estimating the overall user experience (UX) on a device is a common challenge faced by manufacturers. Today, device makers primarily rely on microbenchmark scores, such as Geekbench, that stress test specific hardware components, such as CPU or RAM, but do not satisfactorily capture real-life consumer workloads. System designers often rely on domain-specific heuristics and extensive testing of prototypes to reach a desired UX goal, and yet there is often a mismatch between the manufacturers’ performance claims and the consumers’ experience.We present our initial results on predicting real-life experience on laptops from their hardware specifications. We target web applications that run on Chromebooks (ChromeOS laptops) for a simple and fair aggregation of experience across applications and workloads. On 54 laptops, we track 9 UX metrics on common end-user workloads: web browsing, video playback and audio / video calls. We focus on a subset of high-level metrics exposed by the Chrome browser, that are part of the Web Vitals initiative for measuring user experience on web applications.With a dataset of 100K UX data points, we train gradient boosted regression trees that predict the metric values from device specifications. Across our 9 metrics, we note a mean $R^2$ score (goodness-of-fit on our dataset) of 97.8% and a mean MAAPE (percentage error in prediction on unseen data) of 10.1%.
|
Saswat Padhi · Sunil Bhasin · Udaya Kiran Ammu · Alex Bergman · Allan Knies 🔗 |
-
|
Redco: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs
(
Poster
)
>
link
The recent progress of AI can be largely attributed to large language models (LLMs). However, their escalating memory requirements introduce challenges for machine learning (ML) researchers and engineers. Addressing this requires developers to partition a large model to distribute it across multiple GPUs or TPUs. This necessitates considerable coding and intricate configuration efforts with existing model parallel tools, such as Megatron-LM, DeepSpeed, and Alpa. These tools require users' expertise in machine learning systems (MLSys), creating a bottleneck in LLM development, particularly for developers without MLSys background. In this work, we present Red Coast (Redco), a lightweight and user-friendly tool crafted to automate distributed training and inference for LLMs, as well as to simplify ML pipeline development. The design of Redco emphasizes two key aspects. First, to automate model parallism, our study identifies two straightforward rules to generate tensor parallel strategies for any given LLM. Integrating these rules into Redco facilitates effortless distributed LLM training and inference, eliminating the need of additional coding or complex configurations. We demonstrate the effectiveness by applying Redco to a set of LLM architectures, such as GPT-J, LLaMA, T5, and OPT, up to the model size of 66B, and in the setting of multi-host. Second, we propose a mechanism that allows for the customization of diverse ML pipelines through the definition of merely three functions, eliminating redundant and formulaic code like multi-host related processing. This mechanism proves adaptable across a spectrum of ML algorithms, from foundational language modeling to complex algorithms like meta-learning and reinforcement learning. Consequently, Redco implementations exhibit much fewer code lines compared to their official counterparts. Redco is released under Apache License 2.0 at https://github.com/tanyuqian/redco. |
Bowen Tan · Yun Zhu · Lijuan Liu · Hongyi Wang · Yonghao Zhuang · Jindong Chen · Eric Xing · Zhiting Hu 🔗 |
-
|
ACLTuner: A Profiling-Driven Fast Tuning to Optimized Deep Learning Inference
(
Poster
)
>
link
Deep learning has expanded its footprint across diverse domains. The performance of these computations hinges on the interplay between deep learning compilers and inference libraries. While compilers adapt efficiently to new deep learning operations or models, their tuning processes are too time-consuming. In contrast, inference libraries offer quick execution but with adaptability limitations. To address these challenges, we propose ACLTuner, which optimizes execution configurations using existing inference library kernels. ACLTuner identifies and assigns the optimal kernel through targeted device profiling.Compared to ArmNN, AutoTVM, Ansor, ONNXRuntime, and TFLite, ACLTuner not only achieves up to 2.0x faster execution time across seven deep learning models, but also reduces the average tuning time by 95%. |
Yongin Kwon · Joo Hyoung Cha · Jubin Lee · Misun Yu · Jeman Park · Jemin Lee 🔗 |
-
|
Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference
(
Poster
)
>
link
Due to the recent success of diffusion models, text-to-image generation is becoming increasingly popular and achieves a wide range of applications. Among them, text-to-image editing, or continuous text-to-image generation, attracts lots of attention and can potentially improve the quality of generated images. It's common to see that users may want to slightly edit the generated image by making minor modifications to their input textual descriptions for several rounds of diffusion inference. However, such an image editing process suffers from long-standing heuristics and low inference efficiency. This means that the extent of image editing is uncontrollable, and unnecessary editing invariably leads to extra computation. To solve this problem, we introduce Fast Image Semantically Edit (FISEdit), a cached-enabled sparse diffusion model inference method for efficient text-to-image editing. Extensive empirical results show that FISEdit can be $3.4\times$ and $4.4\times$ faster than existing methods on NVIDIA TITAN RTX and A100 GPUs respectively, and even generates more satisfactory images.
|
Zihao Yu · Haoyang Li · Fangcheng Fu · Xupeng Miao · Bin CUI 🔗 |
-
|
On a Foundation Model for Operating Systems
(
Poster
)
>
link
This paper lays down the research agenda for a domain-specific foundation model for operating systems (OSes).Our case for a foundation model revolves around the observations that several OS components {such as CPU, memory, and network subsystems} are interrelated and that OS traces offer the ideal dataset for a foundation model to grasp the intricacies of diverse OS components and their behavior in varying environments and workloads. We discuss a wide range of possibilities that then arise, from employing foundation models as policy agents to utilizing them as generators and predictors to assist traditional OS control algorithms.Our hope is that this paper spurs further research into OS foundation models and creating the next generation of operating systems for the evolving computing landscape. |
Divyanshu Saxena · Nihal Sharma · Donghyun Kim · Rohit Dwivedula · Jiayi Chen · Chenxi Yang · Sriram Ravula · Zichao Hu · Aditya Akella · Joydeep Biswas · Swarat Chaudhuri · Isil Dillig · Alex Dimakis · Daehyeok Kim · Christopher J. Rossbach
|
-
|
Learning Collaborative Information Dissemination with Graph-based Multi-Agent Reinforcement Learning
(
Poster
)
>
link
In modern communication systems, efficient and reliable information dissemination is crucial for supporting critical operations across domains like disaster response, autonomous vehicles, and sensor networks. This paper introduces a Multi-Agent Reinforcement Learning (MARL) approach as a significant step forward in achieving more decentralized, efficient, and collaborative solutions. We propose a Partially Observable Stochastic Game (POSG) formulation for information dissemination empowering each agent to decide on message forwarding independently, based on their one-hop neighborhood and the degree of connectivity of each neighbor. This constitutes a significant paradigm shift from traditional heuristics based on Multi-Point Relay (MPR) selection. Our approach harnesses Graph Convolutional Reinforcement Learning, employing Graph Attention Networks (GAT) with dynamic attention to capture essential network features. We propose two approaches, L-DGN and HL-DGN, which differ in the information that is exchanged among agents. We evaluate the performance of our decentralized approaches, by comparing them with a widely-used MPR heuristic, and we show that our trained policies are able to efficiently cover the network while bypassing the MPR set selection process. Our approach promises a first step toward supporting the resilience of real-world broadcast communication infrastructures via learned, collaborative information dissemination. |
Raffaele Galliera · K. Brent Venable · Matteo Bassani · Niranjan Suri 🔗 |
-
|
Mitigating Tail Catastrophe in Steered Database Query Optimization with Risk-Averse Contextual Bandits
(
Poster
)
>
link
Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations because they might trade off degraded worst-case behaviour for better average performance. Designing a risk-averse contextual bandit is challenging because exploration is necessary but risk-aversion is sensitive to the entire distribution of rewards; nonetheless we exhibit the first risk-averse contextual bandit algorithm with an online regret guarantee. We apply the technique to a self-tuning software scenario in a production exascale data processing system, where worst-case outcomes should be avoided. |
MĂłnika Farsang · Paul Mineiro · Wangda Zhang 🔗 |
-
|
DeePref: Deep Reinforcement Learning For Video Prefetching In Content Delivery Networks
(
Poster
)
>
link
Content Delivery Networks carry the majority of Internet traffic, and the increasing demand for video content as a major IP traffic across the Internet highlights the importance of caching and prefetching optimization algorithms. Prefetching aims to make data available in the cache before the requester places its request to reduce access time and improve Quality of Experience on the user side. Prefetching is well investigated in operating systems, compiler instructions, in-memory cache, local storage systems, high-speed networks, and cloud systems. Traditional prefetching techniques are well adapted to a particular access pattern, but fail to adapt to sudden variations or randomization in workloads. This paper explores the use of reinforcement learning to tackle the changes in user access patterns and automatically adapt over time. To this end, we propose, DeePref, a Deep Reinforcement Learning agent for online video content prefetching in Content Delivery Networks. DeePref is a prefetcher implemented on edge networks and is agnostic to hardware design, operating systems, and applications. Our results show that DeePref DRQN, using a real-world dataset, achieves a 17% increase in prefetching accuracy and a 28% increase in prefetching coverage on average compared to baseline approaches that use video content popularity as a building block to statically or dynamically make prefetching decisions. We also study possible transfer learning of statistical models from one edge network into another, where unseen user requests from unknown distribution are observed. In terms of transfer learning, the increase in prefetching accuracy and prefetching coverage are [30%, 10%], respectively. Our source code will be available on Github. |
Nawras Alkassab · Tania Lorido Botran · Chin-Tser Huang 🔗 |
-
|
ZeRO++: Extremely Efficient Collective Communication for Large Model Training
(
Poster
)
>
link
While the Zero Redundancy Optimizer (ZeRO) excels in training large-scale models, it struggles to achieve good throughput in environments with limited bandwidth or small batches where communication becomes a major bottleneck. Inspired by the principles of fine-grained quantization in machine learning algorithms, we designed ZeRO++, an optimizer robust to quantization effects that allows for significant communication volume reduction using low-precision quantization techniques. ZeRO++ composes of three communication volume reduction techniques (low-precision all-gather, data remapping, and low-precision gradient averaging) to significantly reduce the communication volume up to 4x that enables up to 2.16x better throughput at 384 GPU scale. Our results also show ZeRO++ can speedup the RLHF by 3.3x compared to vanilla ZeRO. To verify the convergence of ZeRO++, we test up to 13B model for pretraining with 8/6-bits all gather and up to 30B model for finetuning with 4-bit or 2-bit all gather, and demonstrate on-par accuracy as original ZeRO (aka standard training). As a byproduct, the model trained with ZeRO++ is naturally weight-quantized, which can be directly used for inference without post-training quantization or quantization-aware training. |
Guanhua Wang · Heyang Qin · Sam Jacobs · Xiaoxia Wu · Connor Holmes · Zhewei Yao · Samyam Rajbhandari · Olatunji Ruwase · Feng Yan · Lei Yang · Yuxiong He
|
-
|
MASE: An Efficient Representation for Software-Defined ML Hardware System Exploration
(
Poster
)
>
link
Machine learning (ML) accelerators have been studied and used extensively to compute ML models with high performance and low power. However, designing such accelerators normally takes a long time and requires significant effort. Unfortunately, the pace of development of ML software models is much faster than the accelerator design cycle, leading to frequent and drastic modifications in the model architecture, thus rendering many accelerators obsolete. Existing design tools and frameworks can provide quick accelerator prototyping, but only for a limited range of models that can fit into a single hardware device, such as an FPGA. Furthermore, with the emergence of large language models, such as GPT-3, there is an increased need for hardware prototyping of these large models within a many-accelerator system to ensure the hardware can scale with the ever-growing model sizes.The design space is often huge, involving both software and hardware optimization. To address this, we propose a novel representation named MASE IR (Machine-learning Accelerator System Exploration Intermediate Representation) that describes data types, software algorithms, and hardware design constraints. MASEIR opens up opportunities for exploring software and hardware co-optimization at scale. As an application of MASEIR, we implemented a PyTorch-based framework named MASE that automatically optimizes and maps an ML model onto an efficient hardware accelerator system. We believe MASE IR will open new research opportunities for ML system design. |
Cheng Zhang · Jianyi Cheng · Zhewen Yu · Yiren Zhao 🔗 |
-
|
VMR2L: Virtual Machines Rescheduling Using Reinforcement Learning in Data Centers
(
Poster
)
>
link
Modern industry-scale data centers receive thousands of virtual machine (VM) requests per minute. Due to the continual creation and release of VMs, many small resource fragments are scattered across physical machines (PMs). To handle these fragments, data centers periodically reschedule some VMs to alternative PMs. Despite the increasing importance of VM rescheduling as data centers grow in size, the problem remains understudied. We first show that, unlike most combinatorial optimization tasks, the inference time of VM rescheduling algorithms significantly influences their performance, causing many existing methods to scale poorly. Therefore, we develop a reinforcement learning system for VM rescheduling, VMR2L, which incorporates a set of customized techniques, such as a two-stage framework that accommodates diverse constraints and workload conditions as well as an effective feature extraction module. Our experiments on an industry-scale data center show that VMR2L can achieve a performance comparable to the optimal solution, but with a running time of seconds. |
Xianzhong Ding · Yunkai Zhang · Binbin Chen · Donghao Ying · Tieying Zhang · Jianjun Chen · Lei Zhang · Alberto Cerpa · Wan Du 🔗 |
-
|
Enhancing ML model accuracy for Digital VLSI circuits using diffusion models: A study on synthetic data generation
(
Poster
)
>
link
Generative AI has seen remarkable growth over the past few years, with diffusion models being state-of-the-art for image generation. This study investigates the use of diffusion models in generating artificial data generation for electronic circuits for enhancing the accuracy of subsequent machine learning models in tasks such as performance assessment, design, and testing when training data is usually known to be very limited. We utilize simulations in the HSPICE design environment with 22nm CMOS technology nodes to obtain representative real training data for our proposed diffusion model. Our results demonstrate the close resemblance of synthetic data using diffusion model to real data. We validate the quality of generated data, and demonstrate that data augmentation is certainly effective in predictive analysis of VLSI design for digital circuits. |
Prasha Srivastava · Pawan Kumar · Zia Abbas 🔗 |
-
|
CloudEval-YAML: A Realistic and Scalable Benchmark for Cloud Configuration Generation
(
Poster
)
>
link
Among the thriving ecosystem of cloud computing and the proliferation of Large Language Model (LLM)-based code generation tools, there is a lack of benchmarking for code generation in cloud-native applications. In response to this need, we present CloudEval-YAML, a practical benchmark for cloud configuration generation. CloudEval-YAML tackles the diversity challenge by focusing on YAML, the de facto standard of numerous cloud-native tools. We develop the CloudEval-YAML benchmark with practicality in mind: the dataset consists of hand-written problems with unit tests targeting practical scenarios. To improve practicality during evaluation, we build a scalable evaluation platform for CloudEval-YAML that achieves a 20 times speedup over a single machine. To the best of our knowledge, the CloudEval-YAML dataset is the first hand-written dataset targeting cloud-native applications. We present an in-depth evaluation of 13 LLMs, leading to a deeper understanding of the problems and LLMs, as well as effective methods to improve task performance and reduce cost. The codebase is released at https://github.com/alibaba/CloudEval-YAML. |
Yifei Xu · Yuning Chen · Xumiao Zhang · Xianshang Lin · Pan Hu · Yunfei Ma · Songwu Lu · Wan Du · Zhuoqing Morley Mao · Ennan Zhai · Dennis Cai
|
-
|
Silhouette: Toward Performance-Conscious and Transferable CPU Embeddings
(
Poster
)
>
link
Learned embeddings are widely used to obtain concise data representation and enable transfer learning between different data sets and tasks. In this paper, we present our approach Silhouette, that leverages publicly-available CPU performance data sets to learn CPU performance embeddings. We show how Silhouette enables transfer learning across different types of CPU and leads to a significant improvement in performance prediction accuracy for the target CPUs. |
Tarikul Islam Papon · Abdul Wasay 🔗 |
-
|
Improving Large Language Model Hardware Generating Quality through Post-LLM Search
(
Poster
)
>
link
As large language models (LLMs) like ChatGPT exhibited unprecedented machine intelligence, it also shows great performance in assisting hardware engineers to realize higher-efficiency logic design via natural language interaction. However, due to the limitation of LLM, existing LLM-based hardware generating frameworks generate verilog register transfer language(RTL) without considering its performance, power, area(PPA). To overcome this challenge, we design a post LLM search approach to merge design space exploration(DSE) process into current LLM hardware generation workflow, which enables the PPA optimization. At first, our framework begins by generating prompts for the LLM, which then produces initial Verilog programs. Second, an output manager corrects and optimizes these programs before collecting them into the final design space, which is constructed as a HDL search tree. Eventually, the most important post-search stage, our work will do search through this space to select the optimal design under the target metrics.The evaluation shows that our approach improves generating Verilog quality, and shows broader design optimization space compared to prior work and native LLMs alone. |
Kaiyan Chang · Haimeng Ren · Mengdi Wang · shengwen Liang · Yinhe Han · Huawei Li · Xiaowei Li · ying wang 🔗 |
-
|
On the Promise and Challenges of Foundation Models for Learning-based Cloud Systems Management
(
Poster
)
>
link
Foundation models (FMs) are machine learning models that are trained broadly on large-scale data and can be adapted to a set of downstream tasks via fine-tuning, few-shot learning, or even zero-shot learning. Despite the successes of FMs in the language and vision domain, we have yet to see an attempt to develop FMs for cloud systems management (or known as cloud intelligence/AIOps). In this work, we explore the opportunities of developing FMs for cloud systems management. We propose an initial FM design (i.e., the FLASH framework) based on meta-learning and demonstrate its usage in the task of resource configuration search and workload autoscaling. Preliminary results show that FLASH achieves 52.3-90.5% less performance degradation with no adaptation and provides 5.5x faster adaptation. We conclude this paper by discussing the unique risks and challenges of developing FMs for cloud systems management. |
Haoran Qiu · Weichao Mao · Chen Wang · Hubertus Franke · Zbigniew Kalbarczyk · Tamer Basar · Ravishankar Iyer 🔗 |
-
|
PARM: Adaptive Resource Allocation for Datacenter Power Capping
(
Poster
)
>
link
Energy efficiency is pressing in today's cloud datacenters. Various power management strategies, such as oversubscription, power capping, and dynamic voltage and frequency scaling, have been proposed and are in use by datacenter operators to better control power consumption at any management unit (e.g., node-level or rack-level) without breaking power budgets. In addition, by gaining more control over different management units within a datacenter (or across datacenters), operators are able to shift the energy consumption either spatially or temporally to optimize carbon footprint based on the spatio-temporal patterns of carbon intensity. The drive for automation has resulted in the exploration of learning-based resource management approaches. In this work, we first systematically investigate the impact of power capping on both latency-critical datacenter workloads and learning-based resource management solutions (i.e., reinforcement learning or RL). We show that even a 20% reduction in power limit (power capping) leads to an 18% degradation in resource management effectiveness (i.e., defined by an RL reward function) which causes 50% higher application latency. We then propose PARM, an adaptive resource allocation framework that provides graceful performance-preserving transition under power capping for latency-critical workloads. Evaluation results show that PARM achieves 10.2-99.3% improvement in service-level objective (SLO) preservation under power capping while improving 3.1-5.8% utilization. |
Haoran Qiu · Linghao Zhang · Chen Wang · Hubertus Franke · Zbigniew Kalbarczyk · Ravishankar Iyer 🔗 |
-
|
Multi-Agent Join
(
Poster
)
>
link
Real-time performance is crucial for interactive and exploratory data analysis,where users require quick access to subsets or progressive presentations of queryresults. Delivering real-time results over large data for common relational binaryoperators like join is challenging, as join algorithms often spend considerable timescanning and attempting to join parts of relations that may not produce any results.Existing solutions often involve repetitive preprocessing, which is costly and maynot be feasible for interactive workloads or evolving datasets. Additionally, thesesolutions may support only restricted types of joins. This paper presents a novelapproach for achieving efficient progressive join processing. The scan operator ofthe join learns online during query execution, identifying portions of its underlyingrelation that satisfy the join condition. Additionally, an algorithm is introducedwhere both scan operators collaboratively learn to optimize join execution. |
Arash Termehchy · Bakhtiyar Doskenov · Bharghav Srikhakollu · Summit Haque · Huazheng Wang 🔗 |
-
|
Can Semi-Supervised Learning Improve Prediction of Deep Learning Model Resource Consumption?
(
Poster
)
>
link
With the increasing computational demands of Deep Learning (DL), predicting training characteristics like training time and memory usage is crucial for efficient hardware allocation. Traditional methods rely solely on supervised learning for such predictions. Our work integrates a semi-supervised approach for improved accuracy. We present TraPPM, which utilizes a graph autoencoder to understand representations of unlabeled DL graphs, then combined with a supervised graph neural network training to predict the metrics. Our model significantly surpasses standard methods in prediction accuracy, with MAPE values of 9.51\% for training step time and 4.92\% for memory usage. The code and dataset are available at https://github.com/karthickai/trappm |
Karthick Panner Selvam · Mats Brorsson 🔗 |
-
|
Renamer: A Transformer Architecture Invariant to Variable Renaming
(
Poster
)
>
link
Many modeling tasks involve learning functions which are invariant to certain types of input transformations. We study a specific class of invariance: semantics- preserving variable renaming for models of code. We show that vanilla Transformers trained on renaming-invariant tasks do not exhibit renaming invariance. We propose Renamer, a Transformer architecture which is itself invariant to semantics- preserving variable renaming. On a CPU simulation task, Renamer reduces error by between 24.79% and 52.8% compared to a vanilla Transformer. |
Zachary Ankner · Alex Renda · Michael Carbin 🔗 |
-
|
Efficient Prompt Caching for Large Language Model Inference via Embedding Similarity
(
Poster
)
>
link
Large language models (LLMs) have achieved huge success in numerous natural language process (NLP) tasks. However, it faces the challenge of significant resource consumption during inference. In this paper, we aim to improve the inference efficiency of LLMs by prompt caching, i.e., if the current prompt can be answered by the same response of a previous prompt, one can directly utilize that response without calling the LLM. Specifically, we focus on the prediction accuracy of prompt caching for single-round question-answering tasks via embedding similarity. The existing embeddings of prompts mostly focus on whether two prompts are semantically similar, which is not necessarily equivalent to whether the same response can answer them. Therefore, we propose a distillation-based method to fine-tune the existing embeddings for better caching prediction. Theoretically, we provide finite-sample guarantees for the convergence of our method under different types of loss functions. Empirically, we construct a dataset based on Kwiatkowski et al. [2019] and fine-tune the embedding from Wang et al. [2022], which improves the AUC of caching prediction from 0.85 to 0.92 within 10 minutes of training. Theresulting embedding model improves the throughput over the initial embeddingmodel. |
Hanlin Zhu · Banghua Zhu · Jiantao Jiao 🔗 |
-
|
Reinforcement Learning for FPGA Placement
(
Poster
)
>
link
This paper introduces the problem of learning to place blocks in Field-Programmable Gate Arrays (FPGAs) and a preliminary learning-based method. In contrast to previous FPGA placement algorithms, we depart from simulated annealing techniques and instead employ deep reinforcement learning (deep RL) for the placement task with the objective of minimizing wirelength. To facilitate the agent's decision making, we design unique state representations including the chipboard observations and interconnections between different blocks. Additionally, we ground representation learning in the supervised task of predicting placement quality to enhance the RL policy's generalization capabilities. To the best of our knowledge, we are the first to introduce a deep RL agent for FPGA placement, with preliminary results to suggest the feasibility of our approach. We hope that this paper will attract more attention to using RL in FPGAs by electronic design automation engineers. |
shang wang · Deepak Ranganatha Sastry Mamillapalli · Qianxi Li · Tianpei Yang · Matthew Taylor 🔗 |
-
|
Performance Roulette: How Cloud Weather Affects ML-Based System Optimization
(
Poster
)
>
link
As system complexity, workload diversity, and cloud computing adoption continue to grow, both operators and developers are turning to machine learning (ML) based approaches for optimizing systems. ML based approaches typically perform measurements to evaluate candidate system configurations to discover the most optimal configuration. However, it is widely recognized that cloud systems can be effected by "cloud weather", i.e., shifts in performance due to hardware heterogeneity, interference from co-located workloads, virtualization overheads, etc. Given these two trends, in this work we ask: how much can performance variability during training affect ML approaches applied to systems?Using DBMS knob configuration tuning as a case study, we present two measurement studies that show how ML based optimizers can be affected by noise. This leads to four main observable problems: (1) there exist of very sensitive configurations, the performance of which do not transfer across machines of the same type, (2) unstable configurations during training significantly impact configuration transferability, (3) tuning in an environment with non-representative noise degrades final performance in the deployment environment, (4) sampling noise causes a convergence slowdown. Finally, we propose a set of methods to mitigate the challenges in measurements for training ML based system components. |
Johannes Freischuetz · Konstantinos Kanellis · Brian Kroth · Shivaram Venkataraman 🔗 |
-
|
Secrecy and Sensitivity: Privacy-Performance Trade-Offs in Encrypted Traffic Classification
(
Poster
)
>
link
As datasets and models grow in size and complexity to increase performance, the risks associated with sensitive data also grow. Differential privacy (DP) offers a framework for designing mechanisms that provide a degree of privacy that can help conceal sensitive features or information. However, different domains and applications can naturally exhibit different rates of trade-offs between privacy and performance depending on their characteristics. In contrast to well-studied areas (e.g., healthcare), one relatively unexplored domain is network traffic analysis where the data contains sensitive information on users' communications. In this paper, we apply DP to various machine learning models trained to classify between encrypted and non-encrypted packets from network traffic; we emphasize that our goal is to examine a relatively unexplored area to analyze the trade-offs between privacy and performance when the data contains both encrypted and un-encrypted observations. We show how varying model architecture and feature sets can be a relatively simple way to achieve more optimal performance-privacy trade-offs; we also compare and contextualize reasonable privacy budgets from our analysis in the network traffic domain against those in other more well-studied domains. |
Spencer Giddens · Raphael Labaca-Castro · Dan Zhao · Sandra Guasch · Parth Mishra · Nicolas Gama 🔗 |
-
|
LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation
(
Poster
)
>
link
Test stimuli generation has been a crucial but labour-intensive task in hardware design verification. In this paper, we revolutionize this process by harnessing the power of large language models (LLMs) and present a novel benchmarking framework, LLM4DV. This framework introduces a prompt template for interactively eliciting test stimuli from the LLM, along with four innovative prompting improvements to support the pipeline execution and further enhance its performance. We compare LLM4DV to traditional constrained-random testing (CRT), using three self-designed design-under-test (DUT) modules. Experiments demonstrate that LLM4DV excels in efficiently handling straightforward DUT scenarios, leveraging its ability to employ basic mathematical reasoning and pre-trained knowledge. While it exhibits reduced efficiency in complex task settings, it still outperforms CRT in relative terms. The proposed framework and the DUT modules used in our experiments are open-sourced. |
Zixi Zhang · Greg Chadwick · Hugo McNally · Yiren Zhao · Robert Mullins 🔗 |
-
|
Ad-Rec: Advanced Feature Interactions to Address Covariate-Shifts in Recommendation Networks
(
Poster
)
>
link
Recommendation models enhance user experiences by utilizing input feature correlations. However, deep learning-based models encounter challenges from changing user behavior and item features, leading to data distribution shifts. Effective cross-feature learning is crucial in addressing this. We introduce Ad-Rec, an advanced network that leverages feature interaction techniques to tackle these issues. It utilizes masked transformers to learn higher-order cross-features while mitigating data distribution drift. Our approach improves model quality, accelerates convergence, and reduces training time. We demonstrate scalability of Ad-Rec and its superior model quality through extensive ablation studies. |
Muhammad Adnan · Yassaman Ebrahimzadeh Maboud · Divya Mahajan · Prashant Nair 🔗 |
-
|
Predicting User Experience on Laptops from Hardware Specifications
(
Oral
)
>
link
Estimating the overall user experience (UX) on a device is a common challenge faced by manufacturers. Today, device makers primarily rely on microbenchmark scores, such as Geekbench, that stress test specific hardware components, such as CPU or RAM, but do not satisfactorily capture real-life consumer workloads. System designers often rely on domain-specific heuristics and extensive testing of prototypes to reach a desired UX goal, and yet there is often a mismatch between the manufacturers’ performance claims and the consumers’ experience.We present our initial results on predicting real-life experience on laptops from their hardware specifications. We target web applications that run on Chromebooks (ChromeOS laptops) for a simple and fair aggregation of experience across applications and workloads. On 54 laptops, we track 9 UX metrics on common end-user workloads: web browsing, video playback and audio / video calls. We focus on a subset of high-level metrics exposed by the Chrome browser, that are part of the Web Vitals initiative for measuring user experience on web applications.With a dataset of 100K UX data points, we train gradient boosted regression trees that predict the metric values from device specifications. Across our 9 metrics, we note a mean $R^2$ score (goodness-of-fit on our dataset) of 97.8% and a mean MAAPE (percentage error in prediction on unseen data) of 10.1%.
|
Saswat Padhi · Sunil Bhasin · Udaya Kiran Ammu · Alex Bergman · Allan Knies 🔗 |
-
|
ZeRO++: Extremely Efficient Collective Communication for Large Model Training
(
Oral
)
>
link
While the Zero Redundancy Optimizer (ZeRO) excels in training large-scale models, it struggles to achieve good throughput in environments with limited bandwidth or small batches where communication becomes a major bottleneck. Inspired by the principles of fine-grained quantization in machine learning algorithms, we designed ZeRO++, an optimizer robust to quantization effects that allows for significant communication volume reduction using low-precision quantization techniques. ZeRO++ composes of three communication volume reduction techniques (low-precision all-gather, data remapping, and low-precision gradient averaging) to significantly reduce the communication volume up to 4x that enables up to 2.16x better throughput at 384 GPU scale. Our results also show ZeRO++ can speedup the RLHF by 3.3x compared to vanilla ZeRO. To verify the convergence of ZeRO++, we test up to 13B model for pretraining with 8/6-bits all gather and up to 30B model for finetuning with 4-bit or 2-bit all gather, and demonstrate on-par accuracy as original ZeRO (aka standard training). As a byproduct, the model trained with ZeRO++ is naturally weight-quantized, which can be directly used for inference without post-training quantization or quantization-aware training. |
Guanhua Wang · Heyang Qin · Sam Jacobs · Xiaoxia Wu · Connor Holmes · Zhewei Yao · Samyam Rajbhandari · Olatunji Ruwase · Feng Yan · Lei Yang · Yuxiong He
|
-
|
VMR2L: Virtual Machines Rescheduling Using Reinforcement Learning in Data Centers
(
Oral
)
>
link
Modern industry-scale data centers receive thousands of virtual machine (VM) requests per minute. Due to the continual creation and release of VMs, many small resource fragments are scattered across physical machines (PMs). To handle these fragments, data centers periodically reschedule some VMs to alternative PMs. Despite the increasing importance of VM rescheduling as data centers grow in size, the problem remains understudied. We first show that, unlike most combinatorial optimization tasks, the inference time of VM rescheduling algorithms significantly influences their performance, causing many existing methods to scale poorly. Therefore, we develop a reinforcement learning system for VM rescheduling, VMR2L, which incorporates a set of customized techniques, such as a two-stage framework that accommodates diverse constraints and workload conditions as well as an effective feature extraction module. Our experiments on an industry-scale data center show that VMR2L can achieve a performance comparable to the optimal solution, but with a running time of seconds. |
Xianzhong Ding · Yunkai Zhang · Binbin Chen · Donghao Ying · Tieying Zhang · Jianjun Chen · Lei Zhang · Alberto Cerpa · Wan Du 🔗 |
-
|
On the Promise and Challenges of Foundation Models for Learning-based Cloud Systems Management
(
Oral
)
>
link
Foundation models (FMs) are machine learning models that are trained broadly on large-scale data and can be adapted to a set of downstream tasks via fine-tuning, few-shot learning, or even zero-shot learning. Despite the successes of FMs in the language and vision domain, we have yet to see an attempt to develop FMs for cloud systems management (or known as cloud intelligence/AIOps). In this work, we explore the opportunities of developing FMs for cloud systems management. We propose an initial FM design (i.e., the FLASH framework) based on meta-learning and demonstrate its usage in the task of resource configuration search and workload autoscaling. Preliminary results show that FLASH achieves 52.3-90.5% less performance degradation with no adaptation and provides 5.5x faster adaptation. We conclude this paper by discussing the unique risks and challenges of developing FMs for cloud systems management. |
Haoran Qiu · Weichao Mao · Chen Wang · Hubertus Franke · Zbigniew Kalbarczyk · Tamer Basar · Ravishankar Iyer 🔗 |