This tutorial will survey the state of the art in high-performance hardware for machine learning with an emphasis on hardware for training and deployment of deep neural networks (DNNs). We establish a baseline by characterizing the performance and efficiency (perf/W) of DNNs implemented on conventional CPUs. GPU implementations of DNNs make substantial improvements over this baseline. GPU implementations perform best with moderate batch sizes. We examine the sensitivity of performance to batch size. Training of DNNs can be accelerated further using both model and data parallelism, at the cost of inter-processor communication. We examine common parallel formulations and the communication traffic they induce. Training and deployment can also be accelerated by using reduced precision for weights and activations. We will examine the tradeoff between accuracy and precision in these networks. We close with a discussion of dedicated hardware for machine learning. We survey recent publications on this topic and make some general observations about the relative importance of arithmetic and memory bandwidth in such dedicated hardware.
Bill Dally (Stanford University)
Bill is Chief Scientist and Senior Vice President of Research at NVIDIA Corporation and a Professor (Research) and former chair of Computer Science at Stanford University. Bill and his group have developed system architecture, network architecture, signaling, routing, and synchronization technology that can be found in most large parallel computers today. At Bell Labs Bill contributed to the BELLMAC32 microprocessor and designed the MARS hardware accelerator. At Caltech he designed the Torus Routing Chip. At MIT his group built the J-Machine and the M-Machine. At Stanford University his group has developed the Imagine processor, which introduced the concepts of stream processing and partitioned register organizations, the Merrimac supercomputer, which led to GPU computing, and the ELM low-power processor. Bill is a Member of the National Academy of Engineering, a Fellow of the IEEE, a Fellow of the ACM, and a Fellow of the American Academy of Arts and Sciences. He has received the ACM Eckert-Mauchly Award, the IEEE Seymour Cray Award, and the ACM Maurice Wilkes award. He has published over 200 papers in these areas, holds over 100 issued patents, and is an author of the textbooks, Digital Design: A Systems Approach, Digital Systems Engineering, and Principles and Practices of Interconnection Networks.
More from the Same Authors
2015 Poster: Learning both Weights and Connections for Efficient Neural Network »
Song Han · Jeff Pool · John Tran · Bill Dally