This tutorial describes methods to enable efficient processing for deep neural networks (DNNs), which are used in many AI applications including computer vision, speech recognition, robotics, etc. While DNNs deliver best-in-class accuracy and quality of results, it comes at the cost of high computational complexity. Accordingly, designing efficient algorithms and hardware architectures for deep neural networks is an important step towards enabling the wide deployment of DNNs in AI systems (e.g., autonomous vehicles, drones, robots, smartphones, wearables, Internet of Things, etc.), which often have tight constraints in terms of speed, latency, power/energy consumption, and cost.
In this tutorial, we will provide a brief overview of DNNs, discuss the tradeoffs of the various hardware platforms that support DNNs including CPU, GPU, FPGA and ASICs, and highlight important benchmarking/comparison metrics and design considerations for evaluating the efficiency of DNNs. We will then describe recent techniques that reduce the computation cost of DNNs from both the hardware architecture and network algorithm perspective. Finally, we will also discuss how these techniques can be applied to a wide range of image processing and computer vision tasks.