Timezone: »
LOFT: Finding Lottery Tickets through Filter-wise Training
Qihan Wang · Chen Dun · Fangshuo Liao · Christopher Jermaine · Anastasios Kyrillidis
Event URL: https://openreview.net/forum?id=X1N9YExjEF »
In this paper, we explore how one can efficiently identify the emergence of ``winning tickets'' using distributed training techniques, and use this observation to design efficient pretraining algorithms. Our focus in this work is on convolutional neural networks (CNNs), which are more complex than simple multi-layer perceptrons, but simple enough to exposure our ideas. To identify good filters within winning tickets, we propose a novel filter distance metric that well-represents the model convergence, without the need to know the true winning ticket or fully training the model. Our filter analysis behaves consistently with recent findings of neural network learning dynamics. Motivated by such analysis, we present the \emph{LOttery ticket through Filter-wise Training} algorithm, dubbed as \textsc{LoFT}. \textsc{LoFT} is a model-parallel pretraining algorithm that partitions convolutional layers in CNNs by filters to train them independently on different distributed workers, leading to reduced memory and communication costs during pretraining. Experiments show that \textsc{LoFT} $i)$ preserves and finds good lottery tickets, while $ii)$ it achieves non-trivial savings in computation and communication, and maintains comparable or even better accuracy than other pretraining methods.
In this paper, we explore how one can efficiently identify the emergence of ``winning tickets'' using distributed training techniques, and use this observation to design efficient pretraining algorithms. Our focus in this work is on convolutional neural networks (CNNs), which are more complex than simple multi-layer perceptrons, but simple enough to exposure our ideas. To identify good filters within winning tickets, we propose a novel filter distance metric that well-represents the model convergence, without the need to know the true winning ticket or fully training the model. Our filter analysis behaves consistently with recent findings of neural network learning dynamics. Motivated by such analysis, we present the \emph{LOttery ticket through Filter-wise Training} algorithm, dubbed as \textsc{LoFT}. \textsc{LoFT} is a model-parallel pretraining algorithm that partitions convolutional layers in CNNs by filters to train them independently on different distributed workers, leading to reduced memory and communication costs during pretraining. Experiments show that \textsc{LoFT} $i)$ preserves and finds good lottery tickets, while $ii)$ it achieves non-trivial savings in computation and communication, and maintains comparable or even better accuracy than other pretraining methods.
Author Information
Qihan Wang (Rice University)
Chen Dun (Rice University)
Fangshuo Liao (Rice University)
Christopher Jermaine (Rice University)
Anastasios Kyrillidis (Rice University)
More from the Same Authors
-
2021 Spotlight: Neural Program Generation Modulo Static Analysis »
Rohan Mukherjee · Yeming Wen · Dipak Chaudhari · Thomas Reps · Swarat Chaudhuri · Christopher Jermaine -
2021 : Acceleration and Stability of the Stochastic Proximal Point Algorithm »
Junhyung Lyle Kim · Panos Toulis · Anastasios Kyrillidis -
2021 : Acceleration and Stability of the Stochastic Proximal Point Algorithm »
Junhyung Lyle Kim · Panos Toulis · Anastasios Kyrillidis -
2022 : Strong Lottery Ticket Hypothesis with $\epsilon$–perturbation »
Fangshuo Liao · Zheyang Xiong · Anastasios Kyrillidis -
2022 : Strong Lottery Ticket Hypothesis with $\epsilon$–perturbation »
Fangshuo Liao · Zheyang Xiong · Anastasios Kyrillidis -
2022 : Efficient and Light-Weight Federated Learning via Asynchronous Distributed Dropout »
Chen Dun · Mirian Hipolito Garcia · Dimitrios Dimitriadis · Christopher Jermaine · Anastasios Kyrillidis -
2022 : GIST: Distributed Training for Large-Scale Graph Convolutional Networks »
Cameron Wolfe · Jingkang Yang · Fangshuo Liao · Arindam Chowdhury · Chen Dun · Artun Bayer · Santiago Segarra · Anastasios Kyrillidis -
2022 : Poster Session 2 »
Jinwuk Seok · Bo Liu · Ryotaro Mitsuboshi · David Martinez-Rubio · Weiqiang Zheng · Ilgee Hong · Chen Fan · Kazusato Oko · Bo Tang · Miao Cheng · Aaron Defazio · Tim G. J. Rudner · Gabriele Farina · Vishwak Srinivasan · Ruichen Jiang · Peng Wang · Jane Lee · Nathan Wycoff · Nikhil Ghosh · Yinbin Han · David Mueller · Liu Yang · Amrutha Varshini Ramesh · Siqi Zhang · Kaifeng Lyu · David Yunis · Kumar Kshitij Patel · Fangshuo Liao · Dmitrii Avdiukhin · Xiang Li · Sattar Vakili · Jiaxin Shi -
2022 : Contributed Talks 3 »
Cristóbal Guzmán · Fangshuo Liao · Vishwak Srinivasan · Zhiyuan Li -
2021 Poster: Neural Program Generation Modulo Static Analysis »
Rohan Mukherjee · Yeming Wen · Dipak Chaudhari · Thomas Reps · Swarat Chaudhuri · Christopher Jermaine -
2019 : Final remarks »
Anastasios Kyrillidis · Albert Berahas · Fred Roosta · Michael Mahoney -
2019 Workshop: Beyond first order methods in machine learning systems »
Anastasios Kyrillidis · Albert Berahas · Fred Roosta · Michael Mahoney -
2019 : Opening Remarks »
Anastasios Kyrillidis · Albert Berahas · Fred Roosta · Michael Mahoney -
2019 Poster: Learning Sparse Distributions using Iterative Hard Thresholding »
Jacky Zhang · Rajiv Khanna · Anastasios Kyrillidis · Sanmi Koyejo