2022 Volume 29 Issue 3 Pages 854-874
BERT is a pre-trained model that can achieve high accuracy for various NLP tasks with fine-tuning. However, BERT requires tuning of many parameters, making training and inference time-consuming. In this study, we propose to apply smaller BERT for Japanese parsing by dropping some of its layers. In our experiments, we compared the parsing accuracy and processing time of the BERT developed by Kyoto University and smaller BERT with dropping some of its layers, using the Kyoto University Web Document Leads Corpus (referred to as the web corpus) and the Kyoto University Text Corpus (referred to as the text corpus). The experiments revealed that the smaller BERT reduced the training time by 83% and inference time by 65% for the Web corpus and 85% for the text corpus, while maintaining the accuracy degradation from the Kyoto University version of BERT to 0.87 points for the Web corpus and 0.91 points for the text corpus.