CNNs like VGG19, Resnets-50 and GooLeNet show sparsity between 30-90% even after max pooling. During the inference stage, conventional computing architectures such as CPUs and GPUs typically fail to make efficient use of the sparse activations to accelerate the computation. In this demo, we show a novel convolutional neural network accelerator implemented on an FPGA that stores, and operates on, compressed sparse representations of the feature maps.
Our Implementation improves on state of the art for CNN accelerators that operate with high efficiency across varied kernel sizes and that take advantage of sparsity. In contrast to [1], it never decompresses the features map, so it uses zero clock cycles for zeros in the feature maps. In contrast to [2], its flexibility is maintained across a wide range of numbers of input and output feature maps in layers.