論文ID: 2024DAT0003
In this study, we propose a method to efficiently retrieve BERT pre-trained models that achieve good performance on a specific document classification task. In natural language processing problems, the common practice involves fine-tuning existing pre-trained models rather than building new ones from the ground up due to the extensive time and computational resources required. The challenge, however, lies in identifying the most suitable model from a large number of available pre-trained models. To address this problem, our proposed method utilizes the k-nearest neighbor algorithm to retrieve appropriate BERT pre-trained models without the necessity for fine-tuning. We conducted experiments by constructing a benchmark dataset with 28 document classification tasks and 20 BERT models.