単方向・双方向事前学習済み言語モデルにおけるアーキテクチャ・事前学習方法の違いによる影響の分析

長尾 浩良; 五藤 巧; 是枝 祐太

doi:10.11517/pjsai.JSAI2024.0_4Xin237

Abstract

Recently, language models have grown in scale, allowing a single model to address broad tasks that previously required individual development. Unidirectional pretrained language models (UniPLMs) such as GPT have become extremely large, with tens of billions of parameters, while bidirectional pretrained language models (BiPLMs) such as BERT stayed few hundred millions at most. However, a previous research showed that BiPLMs with a relatively small number of parameters are more effective for classical non-generative tasks. The purpose of this study is to determine whether the model architecture or the pretraining objective brings about the difference. We trained BiPLMs and UniPLMs under a controlled condition and confirmed that the differences in GLUE scores increase before and after pretraining. Since the only difference between the two models before pretraining is model architectures, the results imply that the influence of pretraining objectives is more dominant than that of model architectures.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!