自然言語処理モデルの圧縮におけるプルーニング対象ブロック決定手法の検討

徳政 光人; 吉岡 理文; 井上 勝文

doi:10.11517/pjsai.JSAI2022.0_3Yin232

Abstract

Machine learning in natural language processing has been dominated by large pre-trained Transformer models, and it is known that the size of the model has a significant impact on performance. In this situation, BERT and other large models are out of reach for many people without large memory GPUs. Pruning is the method to solve this problem that removes unnecessary parameters from the network. Poor Man's BERT is an exiting pruning method that reduces the encoder block's size. It achieves higher performance than DistilBERT, but the pruning strategies are determined manually. We aim to improve the performance of Poor Man's BERT by determining the target blocks for pruning automatically. In this research, we introduce each layer's importance score based on the change in the loss. In experiments, it was confirmed that the performance degradation was reduced compared to the conventional method.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!