Program Highlights »
Demonstration
Tue Dec 5th 07:00 -- 10:30 PM @ Pacific Ballroom Concourse #D3
TincyYolo: Smaller still, faster, and more efficient
Michaela Blott · Nicholas Fraser

Recent research demonstrated that even extreme reduced precision works well for convolutional neural networks used for object classification. We leveraged similar quantization techniques in combination with filter pruning to reduce the computational footprint of YOLO networks such that high performance implementations within power-constraint embedded compute environments can be achieved. The demo will consist of a small embedded platform at ~6Watts power consumption, directly connected to a USB camera and a display port. The compute is performed by a Xilinx Zynq Ultrascale+ device which consists of a quadcore ARM processor and a medium-sized FPGA fabric. The live camera video stream will be processed by the MPSOC device’s ARM processors, NEON cores and a NN accelerator in the FPGA fabric in real-time and shown on a monitor, whereby the 20 object classes of Pascal VOC are live classified and indicated through bounding boxes. The run-time environment is fully integrated with DarkNet and demonstrated with dynamic off-loading and on-loading the accelerators. Users can directly interact with the demo through holding different types of objects in front of the camera to test the accuracy of the heavily quantized and pruned neural network. Furthermore, users can dynamically move layers from ARM processors and NEON to the FPGA fabric to experience the speed up and latency reduction of custom hardware accelerators. To the best of our knowledge, this is the first extreme reduced precision and pruned variant of YOLO demonstrated. While FPGA-based neural networks have started to emerge, this is a first which demonstrates high performance and reduced power for object recognition. Furthermore, our extensions of a DarkNet run-time that allows for dynamic on- and offloading on ARM, NEON and FPGA is novel.