Deep neural networks (DNNs) find extensive applications across diverse domains, including Speech Recognition, Face Detection, and Image Classification. While the conventional approach relies on Graphics Processing Units (GPUs) for DNN implementation, it prioritizes speed at the expense of efficiency. In the pursuit of reduced power consumption and enhanced efficiency, we advocate for the adoption of application-specific hardware computing. This paper introduces a run-time reconfigurable DNN accelerator SoC (DNN-AS) architecture, seamlessly integrated into the instruction-extended RISC-V platform. The meticulously crafted application-specific extension instruction set is tailored to expedite high-frequency DNN operations. To optimize circuit structure, we have devised an 8-bit dynamic fixed-point (DFP) scheme within the DNN-AS. Furthermore, we conduct a comparative accuracy analysis between DFP and the PyTorch float implementation. Our results demonstrate that DNN-AS exhibits minimal accuracy loss, with Top 1 accuracy deviations of only up to 0.53%, 0.31%, and 0.68% for RESNET34, RESNET50, and RESNET101, respectively. Finally, we juxtapose the overall simulated results with other platforms, revealing that our design has achieved remarkable improvements in throughput per joule (GOP/J), ranging from 8.4x to 1897x compared to Field-Programmable Gate Arrays (FPGAs) and GPU.
View full abstract