Timezone: »

EMC2: Energy Efficient Machine Learning and Cognitive Computing (5th edition)
Raj Parihar · Raj Parihar · Michael Goldfarb · Michael Goldfarb · Satyam Srivastava · TAO SHENG · Debajyoti Pal

Fri Dec 13 08:00 AM -- 06:40 PM (PST) @ West 306
Event URL: https://www.emc2-workshop.com/neurips-19 »

A new wave of intelligent computing, driven by recent advances in machine learning and cognitive algorithms coupled with process technology and new design methodologies, has the potential to usher unprecedented disruption in the way modern computing systems are designed and deployed. These new and innovative approaches often provide an attractive and efficient alternative not only in terms of performance but also power, energy, and area. This disruption is easily visible
across the whole spectrum of computing systems -- ranging from low end mobile devices to large scale data centers and servers including intelligent infrastructures.

A key class of these intelligent solutions is providing real-time, on-device cognition at the edge to enable many novel applications including computer vision and image processing, language understanding, speech and gesture recognition, malware detection and autonomous driving. Naturally, these applications have diverse requirements for performance, energy, reliability, accuracy, and security that demand a holistic approach to designing the hardware, software, and
intelligence algorithms to achieve the best power, performance, and area (PPA).

- Architectures for the edge: IoT, automotive, and mobile
- Approximation, quantization reduced precision computing
- Hardware/software techniques for sparsity
- Neural network architectures for resource constrained devices
- Neural network pruning, tuning and and automatic architecture search
- Novel memory architectures for machine learning
- Communication/computation scheduling for better performance and energy
- Load balancing and efficient task distribution techniques
- Exploring the interplay between precision, performance, power and energy
- Exploration of new and efficient applications for machine learning
- Characterization of machine learning benchmarks and workloads
- Performance profiling and synthesis of workloads
- Simulation and emulation techniques, frameworks and platforms for machine learning
- Power, performance and area (PPA) based comparison of neural networks
- Verification, validation and determinism in neural networks
- Efficient on-device learning techniques
- Security, safety and privacy challenges and building secure AI systems

Fri 8:00 a.m. - 8:45 a.m.


Yann LeCun
Fri 8:45 a.m. - 9:30 a.m.

Computing near the sensor is preferred over the cloud due to privacy and/or latency concerns for a wide range of applications including robotics/drones, self-driving cars, smart Internet of Things, and portable/wearable electronics. However, at the sensor there are often stringent constraints on energy consumption and cost in addition to the throughput and accuracy requirements of the application. In this talk, we will describe how joint algorithm and hardware design can be used to reduce energy consumption while delivering real-time and robust performance for applications including deep learning, computer vision, autonomous navigation/exploration and video/image processing. We will show how energy-efficient techniques that exploit correlation and sparsity to reduce compute, data movement and storage costs can be applied to various tasks including image classification, depth estimation, super-resolution, localization and mapping.

Vivienne Sze
Fri 9:30 a.m. - 10:00 a.m.

Machine learning (ML) applications have entered and impacted our lives unlike any other technology advance from the recent past. Indeed, almost every aspect of how we live or interact with others relies on or uses ML for applications ranging from image classification and object detection, to processing multi‐modal and heterogeneous datasets. While the holy grail for judging the quality of a ML model has largely been serving accuracy, and only recently its resource usage, neither of these metrics translate directly to energy efficiency, runtime, or mobile device battery lifetime. This talk will uncover the need for building accurate, platform‐specific power and latency models for convolutional neural networks (CNNs) and efficient hardware-aware CNN design methodologies, thus allowing machine learners and hardware designers to identify not just the best accuracy NN configuration, but also those that satisfy given hardware constraints. Our proposed modeling framework is applicable to both high‐end and mobile platforms and achieves 88.24% accuracy for latency, 88.34% for power, and 97.21% for energy prediction. Using similar predictive models, we demonstrate a novel differentiable neural architecture search (NAS) framework, dubbed Single-Path NAS, that uses one single-path over-parameterized CNN to encode all architectural decisions based on shared convolutional kernel parameters. Single-Path NAS achieves state-of-the-art top-1 ImageNet accuracy (75.62%), outperforming existing mobile NAS methods for similar latency constraints (∼80ms) and finds the final configuration up to 5,000× faster compared to prior work. Combined with our quantized CNNs (Flexible Lightweight CNNs or FLightNNs) that customize precision level in a layer-wise fashion and achieve almost iso-accuracy at 5-10x energy reduction, such a modeling, analysis, and optimization framework is poised to lead to true co-design of hardware and ML model, orders of magnitude faster than state of the art, while satisfying both accuracy and latency or energy constraints.

Diana Marculescu
Fri 10:00 a.m. - 10:30 a.m.
Poster Session 1 (Break)
Simeon Spasov, Prateeth Nayak, Ferran Diego Andilla, Tianyi Zhang, Amit Trivedi
Fri 10:30 a.m. - 11:00 a.m.

Deep Neural Net models have provided the most accurate solutions to a very wide variety of problems in vision, language, and speech; however, the design, training, and optimization of efficient DNNs typically requires resorting to the “dark arts” of ad hoc methods and extensive hyperparameter tuning. In this talk we present our progress on abandoning these dark arts by using Differential Neural Architecture Search to guide the design of efficient DNNs and by using Hessian-based methods to guide the processes of training and quantizing those DNNs.

Kurt Keutzer
Fri 11:00 a.m. - 11:30 a.m.
Hardware-aware Neural Architecture Design for Small and Fast Models: from 2D to 3D (Invited Talk)
Song Han
Fri 11:30 a.m. - 12:30 p.m.
Oral Session 1 (Presentation)
Jiahui Yu, David Hartmann, Meng Li, Javad Shafiee, Huanrui Yang, Ofir Zafrir
Fri 12:30 p.m. - 2:00 p.m.
Lunch (Break)
Fri 2:00 p.m. - 2:45 p.m.

In this talk I will describe the need for low power machine learning systems. I will motivate this by describing several current projects at Purdue University that have a need for energy efficient deep learning and in some cases the real deployment of these methods will not be possible without lower power solutions. The applications include precision farming, health care monitoring, and edge-based surveillance.

Edward Delp
Fri 2:45 p.m. - 3:15 p.m.

Edge AI applications retain the need for high-performing inference models, while driving platforms beyond their limits of energy efficiency and throughput. Digital hardware acceleration, enabling 10-100x gains over general-purpose architectures, is already widely deployed, but is ultimately restricted by data-movement and memory accessing that dominates deep-learning computations. In-memory computing, based on both SRAM and emerging memory, offers fundamentally new tradeoffs for overcoming these barriers, with the potential for 10x higher energy efficiency and area-normalized throughput demonstrated in recent designs. But, those tradeoffs instate new challenges, especially affecting scaling to the level of computations required, integration in practical heterogeneous architectures, and mapping of diverse software. This talk examines those tradeoffs to characterize the challenges. It then explores recent research that provides promising paths forward, making in-memory computing more of a practical reality than ever before.

Naveen Verma
Fri 3:15 p.m. - 3:45 p.m.

In recent years, machine learning (ML) with deep neural networks (DNNs) has been widely deployed in diverse application domains. However, the growing complexity of DNN models, the slowdown of technology scaling, and the proliferation of edge devices are driving a demand for higher DNN performance and energy efficiency. ML applications have shifted from general-purpose processors to dedicated hardware accelerators in both academic and commercial settings. In line with this trend, there has been an active body of research on both algorithms and hardware architectures for neural network specialization.

This talk presents our recent investigation into DNN optimization and low-precision quantization, using a co-design approach featuring contributions to both algorithms and hardware accelerators. First, we review static network pruning techniques and show a fundamental link between group convolutions and circulant matrices – two previously disparate lines of research in DNN compression. Then we discuss channel gating, a dynamic, fine-grained, and trainable technique for DNN acceleration. Unlike static approaches, channel gating exploits input-dependent dynamic sparsity at run time. This results in a significant reduction in compute cost with a minimal impact on accuracy. Finally, we present outlier channel splitting, a technique to improve DNN weight quantization by removing outliers from the weight distribution without retraining.

Zhiru Zhang
Fri 3:45 p.m. - 4:15 p.m.
Poster Session 2 (Break)
Gabriele Prato, Urmish Thakker, Laura Galindez Olascoaga, Tianyu Zhang, Vahid Partovi Nia, Kamil Adamczewski
Fri 4:15 p.m. - 4:45 p.m.
Adaptive Multi-Task Neural Networks for Efficient Inference (Invited Talk)
Rogerio Feris
Fri 4:45 p.m. - 5:15 p.m.
Kernel and Graph Optimization for DL Model Execution (Invited Talk)
Jinwon Lee
Fri 5:15 p.m. - 5:45 p.m.
Configurable Cloud-Scale DNN Processor for Real-Time AI (Invited Talk)
Bita Darvish Rouhani
Fri 5:45 p.m. - 6:45 p.m.
Oral Session 2 (Presentation)
Shun Liao, Jeff McKinstry, Peter Izsak, Meng Li, Qijing (Jenny) Huang, Gonçalo Mordido

Author Information

Raj Parihar (Microsoft)
Raj Parihar (Microsoft)
Michael Goldfarb (Nvidia)
Michael Goldfarb (Qualcomm)

SoC Architecture Research @ Qualcomm

Satyam Srivastava (Intel Corporation)
TAO SHENG (Amazon)
Debajyoti Pal (Cadence Design Systems, Inc.)