2023 Volume 52 Issue 4 Pages 527-530
The functionalities of smartphone applications or IoT technologies have been widely enhanced, and deep learning models have taken a role in many recent products. Also, the need to adopt deep learning models in edge devices is increasing. However, it is challenging to load models with enormous parameters on edge devices because of memory or power shortage. Model reduction by parameter approximation enables large models on small devices and enlarges application areas. This paper proposes leveraging model reduction using matrix factorization with well-known parameter reduction methods, pruning, and quantization. Since the target of matrix factorization is the weight matrix of linear calculation, we apply our proposed method to one of the recent popular models in the image recognition area, a Transformer model, most parameters of which belongs to its fully connected layers. Experimental results show that our method can reduce the size of the trained CIFAR10 model up to 60% of that of the original model. We can use the ratio of the sign of output vector elements in each layer as the index of layer selection as the objective of pruning.