Timezone: »
The state-of-the-art hardware platforms for training deep neural networks are moving from traditional single precision (32-bit) computations towards 16 bits of precision - in large part due to the high energy efficiency and smaller bit storage associated with using reduced-precision representations. However, unlike inference, training with numbers represented with less than 16 bits has been challenging due to the need to maintain fidelity of the gradient computations during back-propagation. Here we demonstrate, for the first time, the successful training of deep neural networks using 8-bit floating point numbers while fully maintaining the accuracy on a spectrum of deep learning models and datasets. In addition to reducing the data and computation precision to 8 bits, we also successfully reduce the arithmetic precision for additions (used in partial product accumulation and weight updates) from 32 bits to 16 bits through the introduction of a number of key ideas including chunk-based accumulation and floating point stochastic rounding. The use of these novel techniques lays the foundation for a new generation of hardware training platforms with the potential for 2-4 times improved throughput over today's systems.
Author Information
Naigang Wang (IBM T. J. Watson Research Center)
Jungwook Choi (IBM Research)
Daniel Brand (IBM Research)
Chia-Yu Chen (IBM research)
my research areas focus on: accelerator architecture compiler design and library development machine learning and neural network VLSI and nano device
Kailash Gopalakrishnan (IBM Research)
More from the Same Authors
-
2022 Poster: Deep Compression of Pre-trained Transformer Models »
Naigang Wang · Chi-Chun (Charlie) Liu · Swagath Venkataramani · Sanchari Sen · Chia-Yu Chen · Kaoutar El Maghraoui · Vijayalakshmi (Viji) Srinivasan · Leland Chang -
2020 Poster: Ultra-Low Precision 4-bit Training of Deep Neural Networks »
Xiao Sun · Naigang Wang · Chia-Yu Chen · Jiamin Ni · Ankur Agrawal · Xiaodong Cui · Swagath Venkataramani · Kaoutar El Maghraoui · Vijayalakshmi (Viji) Srinivasan · Kailash Gopalakrishnan -
2020 Poster: ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training »
Chia-Yu Chen · Jiamin Ni · Songtao Lu · Xiaodong Cui · Pin-Yu Chen · Xiao Sun · Naigang Wang · Swagath Venkataramani · Vijayalakshmi (Viji) Srinivasan · Wei Zhang · Kailash Gopalakrishnan -
2020 Oral: Ultra-Low Precision 4-bit Training of Deep Neural Networks »
Xiao Sun · Naigang Wang · Chia-Yu Chen · Jiamin Ni · Ankur Agrawal · Xiaodong Cui · Swagath Venkataramani · Kaoutar El Maghraoui · Vijayalakshmi (Viji) Srinivasan · Kailash Gopalakrishnan -
2020 Poster: FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training »
Yonggan Fu · Haoran You · Yang Zhao · Yue Wang · Chaojian Li · Kailash Gopalakrishnan · Zhangyang Wang · Yingyan Lin -
2019 Poster: Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks »
Xiao Sun · Jungwook Choi · Chia-Yu Chen · Naigang Wang · Swagath Venkataramani · Vijayalakshmi (Viji) Srinivasan · Xiaodong Cui · Wei Zhang · Kailash Gopalakrishnan