Timezone: »
Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of Domain Specific Architectures (DSAs) being offered as hardware accelerators in addition to CPUs. These trends necessitate distributing the workload across multiple devices. Recent work has shown that significant gains can be obtained with model parallelism, i.e, partitioning a neural network's computational graph onto multiple devices. In particular, this form of parallelism assumes a pipeline of devices, which is fed a stream of samples and yields high throughput for training and inference of DNNs. However, for such settings (large models and multiple heterogeneous devices), we require automated algorithms and toolchains that can partition the ML workload across devices.
In this paper, we identify and isolate the structured optimization problem at the core of device placement of DNN operators, for both inference and training, especially in modern pipelined settings. We then provide algorithms that solve this problem to optimality. We demonstrate the applicability and efficiency of our approaches using several contemporary DNN computation graphs.
Author Information
Jakub Tarnawski (Microsoft Research)
Amar Phanishayee (Microsoft Research)
Nikhil Devanur (Amazon)
Divya Mahajan (Microsoft)
Fanny Nina Paravecino (Microsoft)
Dr. Nina-Paravecino is currently a Senior Researcher at the AI and Advance Architectures group in Microsoft, where she leads different efforts to improve performance of Deep Learning workloads. Previously, Dr. Nina-Paravecino was part of Intel Corporation as a Research Scientist to push Intel’s ground-breaking volumetric reconstruction technology using Deep Learning. In the past, her work has contributed to efficiently exploit GPU architectures and enabled identification of bottlenecks on a myriad of applications that includes image processing and video analytics. Dr Nina-Paravecino received her Ph.D. in Computer Engineering from Northeastern University, her M.Sc. in Computer Engineering from University of Puerto Rico at Mayaguez Campus, and her B.S. in System and Informatics Engineering from University of San Antonio Abad of Cusco – Peru. She has been PC-member/Reviewer of different Journals/Conferences/Workshops such as IEEE Transactions on Image Processing 2017, JPDC 2017, CF 2018, PPoPP 2018, SC 2018, GPGPU 2018, PARCO 2018, IA^3 2019, SC 2019, DAC 2020, ICCD 2020, HPCA 2021. Most recently, Dr. Nina was co-chair of the Video Analytics mini-track at HICSS 2020
More from the Same Authors
-
2022 Poster: Near-Optimal Correlation Clustering with Privacy »
Vincent Cohen-Addad · Chenglin Fan · Silvio Lattanzi · Slobodan Mitrovic · Ashkan Norouzi-Fard · Nikos Parotsidis · Jakub Tarnawski -
2022 Affinity Workshop: LatinX in AI »
Maria Luisa Santiago · Juan Banda · CJ Barberan · MIGUEL GONZALEZ-MENDOZA · Caio Davi · Sara Garcia · Jorge Diaz · Fanny Nina Paravecino · Carlos Miranda · Gissella Bejarano Nicho · Fabian Latorre · Andres Munoz Medina · Abraham Ramos · Laura Montoya · Isabel Metzger · Andres Marquez · Miguel Felipe Arevalo-Castiblanco · Jorge Mendez · Karla Caballero · Atnafu Lambebo Tonja · Germán Olivo · Karla Caballero Barajas · Francisco Zabala -
2021 Poster: Piper: Multidimensional Planner for DNN Parallelization »
Jakub Tarnawski · Deepak Narayanan · Amar Phanishayee -
2020 Poster: Fully Dynamic Algorithm for Constrained Submodular Optimization »
Silvio Lattanzi · Slobodan Mitrović · Ashkan Norouzi-Fard · Jakub Tarnawski · Morteza Zadimoghaddam -
2020 Oral: Fully Dynamic Algorithm for Constrained Submodular Optimization »
Silvio Lattanzi · Slobodan Mitrović · Ashkan Norouzi-Fard · Jakub Tarnawski · Morteza Zadimoghaddam -
2020 Poster: Fairness in Streaming Submodular Maximization: Algorithms and Hardness »
Marwa El Halabi · Slobodan Mitrović · Ashkan Norouzi-Fard · Jakab Tardos · Jakub Tarnawski