IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Online ISSN : 1745-1337
Print ISSN : 0916-8508

この記事には本公開記事があります。本公開記事を参照してください。
引用する場合も本公開記事を引用してください。

Accelerating CNN Inference with an Adaptive Quantization Method Using Computational Complexity-Aware Regularization
Kengo NAKATADaisuke MIYASHITAJun DEGUCHIRyuichi FUJIMOTO
著者情報
ジャーナル フリー 早期公開

論文ID: 2023EAP1163

この記事には本公開記事があります。
詳細
抄録

Quantization is commonly used to reduce the inference time of convolutional neural networks (CNNs). To reduce the inference time without drastically reducing accuracy, optimal bit widths need to be allocated for each layer or filter of the CNN. In conventional methods, the optimal bit allocation is obtained by using the gradient descent algorithm while minimizing the model size. However, the model size has little to no correlation with the inference time. In this paper, we present a computational-complexity metric called MAC×bit that is strongly correlated with the inference time of quantized CNNs. We propose a gradient descent-based regularization method that uses this metric for optimal bit allocation of a quantized CNN to improve the recognition accuracy and reduce the inference time. In experiments, the proposed method reduced the inference time of a quantized ResNet-18 model by 21.0% compared with the conventional regularization method based on model size while maintaining comparable recognition accuracy.

著者関連情報
© 2024 The Institute of Electronics, Information and Communication Engineers
feedback
Top