2021 Volume E104.D Issue 5 Pages 772-775
This work presents a DNN accelerator architecture specifically designed for performing efficient inference on compressed and sparse DNN models. Leveraging the data sparsity, a runtime processing scheme is proposed to deal with the encoded weights and activations directly in the compressed domain without decompressing. Furthermore, a new data flow is proposed to facilitate the reusage of input activations across the fully-connected (FC) layers. The proposed design is implemented and verified using the Xilinx Virtex-7 FPGA. Experimental results show it achieves 1.99×, 1.95× faster and 20.38×, 3.04× more energy efficient than CPU and mGPU platforms, respectively, running AlexNet.