Timezone: »
In this paper, we propose a number of novel techniques and numerical representation formats that enable, for the very first time, the precision of training systems to be aggressively scaled from 8-bits to 4-bits. To enable this advance, we explore a novel adaptive Gradient Scaling technique (Gradscale) that addresses the challenges of insufficient range and resolution in quantized gradients as well as explores the impact of quantization errors observed during model training. We theoretically analyze the role of bias in gradient quantization and propose solutions that mitigate the impact of this bias on model convergence. Finally, we examine our techniques on a spectrum of deep learning models in computer vision, speech, and NLP. In combination with previously proposed solutions for 4-bit quantization of weight and activation tensors, 4-bit training shows a non-significant loss in accuracy across application domains while enabling significant hardware acceleration (> 7X over state-of-the-art FP16 systems).
Author Information
Xiao Sun (IBM Thomas J. Watson Research Center)
Naigang Wang (IBM T. J. Watson Research Center)
Chia-Yu Chen (IBM research)
my research areas focus on: accelerator architecture compiler design and library development machine learning and neural network VLSI and nano device
Jiamin Ni (IBM)
Ankur Agrawal (IBM Research)
Xiaodong Cui (IBM T. J. Watson Research Center)
Swagath Venkataramani (IBM Research)
Kaoutar El Maghraoui (IBM Research)
Dr. Kaoutar El Maghraoui is a principal research scientist at the IBM T.J Watson Research Center where she is focusing on innovations at the intersection of systems and artificial intelligence (AI). She leads the research agenda of End-Use experimental AI testbed of the IBM Research AI Hardware Center, a global research hub focusing on enabling next-generation accelerators and systems for AI workloads. . She co-led IBM’s Global Technology Outlook in 2017 where she contributed to creating IBM’s vision for the future of IT across global labs and business units focusing on IBM’s AI leadership. Kaoutar has co-authored several patents, conference, and journal publications in the areas of systems research, distributed systems, high performance computing, and AI. Kaoutar holds a PhD. degree from Rensselaer Polytechnic Institute, USA. She received several awards including the Robert McNaughton Award for best thesis in computer science, IBM’s Eminence and Excellence award for leadership in increasing Women’s presence in science and technology, and 2 IBM outstanding technical accomplishments. Kaoutar is global vice-chair of the Arab Women in Computing organization and avid supporter and volunteers of several women in science and technology initiatives.
Vijayalakshmi (Viji) Srinivasan (IBM TJ Watson)
Kailash Gopalakrishnan (IBM Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Poster: Ultra-Low Precision 4-bit Training of Deep Neural Networks »
Wed. Dec 9th 05:00 -- 07:00 PM Room Poster Session 3 #754
More from the Same Authors
-
2022 Poster: A Stochastic Linearized Augmented Lagrangian Method for Decentralized Bilevel Optimization »
Songtao Lu · Siliang Zeng · Xiaodong Cui · Mark Squillante · Lior Horesh · Brian Kingsbury · Jia Liu · Mingyi Hong -
2022 Poster: Deep Compression of Pre-trained Transformer Models »
Naigang Wang · Chi-Chun (Charlie) Liu · Swagath Venkataramani · Sanchari Sen · Chia-Yu Chen · Kaoutar El Maghraoui · Vijayalakshmi (Viji) Srinivasan · Leland Chang -
2020 Poster: A Decentralized Parallel Algorithm for Training Generative Adversarial Nets »
Mingrui Liu · Wei Zhang · Youssef Mroueh · Xiaodong Cui · Jarret Ross · Tianbao Yang · Payel Das -
2020 Poster: ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training »
Chia-Yu Chen · Jiamin Ni · Songtao Lu · Xiaodong Cui · Pin-Yu Chen · Xiao Sun · Naigang Wang · Swagath Venkataramani · Vijayalakshmi (Viji) Srinivasan · Wei Zhang · Kailash Gopalakrishnan -
2020 Poster: FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training »
Yonggan Fu · Haoran You · Yang Zhao · Yue Wang · Chaojian Li · Kailash Gopalakrishnan · Zhangyang Wang · Yingyan Lin -
2019 Poster: Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks »
Xiao Sun · Jungwook Choi · Chia-Yu Chen · Naigang Wang · Swagath Venkataramani · Vijayalakshmi (Viji) Srinivasan · Xiaodong Cui · Wei Zhang · Kailash Gopalakrishnan -
2018 Poster: Training Deep Neural Networks with 8-bit Floating Point Numbers »
Naigang Wang · Jungwook Choi · Daniel Brand · Chia-Yu Chen · Kailash Gopalakrishnan -
2018 Poster: Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks »
Xiaodong Cui · Wei Zhang · Zoltán Tüske · Michael Picheny -
2017 Poster: Dilated Recurrent Neural Networks »
Shiyu Chang · Yang Zhang · Wei Han · Mo Yu · Xiaoxiao Guo · Wei Tan · Xiaodong Cui · Michael Witbrock · Mark Hasegawa-Johnson · Thomas Huang