Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
36th (2022)
Session ID : 3Yin2-32
Conference information

Determining the Target Block for Pruning in Natural Language Processing Model Compression
*Akito TOKUMASAYoshioka MICHIFUMIKatsufumi INOUE
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Machine learning in natural language processing has been dominated by large pre-trained Transformer models, and it is known that the size of the model has a significant impact on performance. In this situation, BERT and other large models are out of reach for many people without large memory GPUs. Pruning is the method to solve this problem that removes unnecessary parameters from the network. Poor Man's BERT is an exiting pruning method that reduces the encoder block's size. It achieves higher performance than DistilBERT, but the pruning strategies are determined manually. We aim to improve the performance of Poor Man's BERT by determining the target blocks for pruning automatically. In this research, we introduce each layer's importance score based on the change in the loss. In experiments, it was confirmed that the performance degradation was reduced compared to the conventional method.

Content from these authors
© 2022 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top