IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Online ISSN : 1745-1337
Print ISSN : 0916-8508
Quantization Strategy for Achieving Low-Cost, Accurate, Pareto-Optimal Convolutional Neural Networks Based on Analysis of Quantized Weight Parameters
Kengo NAKATADaisuke MIYASHITAAsuka MAKIFumihiko TACHIBANAShinichi SASAKIJun DEGUCHIRyuichi FUJIMOTO
著者情報
ジャーナル フリー 早期公開

論文ID: 2025EAP1034

詳細
抄録

Quantization is an effective way to reduce memory and computational costs in the inference of convolutional neural networks. However, it remains unclear which model can achieve higher recognition accuracy while minimizing memory and computational costs: a large model (with a large number of parameters) quantized to an extremely low bit width (1 or 2 bits) or a small model (with a small number of parameters) quantized to a moderately low bit width (3, 4, or 5 bits). In this paper, we define a metric that combines the numbers of parameters and computations with the bit widths of quantized weight parameters. By utilizing this metric, we demonstrate that Pareto-optimal performance, where the best accuracy is attained at a given memory or computational cost, is achieved when a small model is moderately quantized, not when a large model is extremely quantized. Based on this finding, we empirically show that the Pareto frontier is improved by 4.3 × in a post-training quantization scenario for a quantized ResNet-50 model using the ImageNet dataset.

著者関連情報
© 2025 The Institute of Electronics, Information and Communication Engineers
前の記事 次の記事
feedback
Top