Weight compression MAC accelerator for effective inference of deep learning

Asuka Maki; Daisuke Miyashita; Shinichi Sasaki; Kengo Nakata; Fumihiko Tachibana; Tomoya Suzuki; Jun Deguchi; Ryuichi Fujimoto

doi:10.1587/transele.2019CTP0007

この記事には本公開記事があります。本公開記事を参照してください。
引用する場合も本公開記事を引用してください。

Weight compression MAC accelerator for effective inference of deep learning

Asuka Maki, Daisuke Miyashita, Shinichi Sasaki, Kengo Nakata, Fumihiko Tachibana, Tomoya Suzuki, Jun Deguchi, Ryuichi Fujimoto

著者情報

キーワード: deep learning, convolutional neural network, quantization, variable bit width, post-training, inference, accelerator, processor, FPGA

ジャーナル認証あり早期公開

論文ID: 2019CTP0007

DOI https://doi.org/10.1587/transele.2019CTP0007

この記事には本公開記事があります。

The final version of this article is now available: Vol. E103.C (2020), No. 10 pp. 514-523

詳細

抄録

Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）