Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
Recently, language models have grown in scale, allowing a single model to address broad tasks that previously required individual development. Unidirectional pretrained language models (UniPLMs) such as GPT have become extremely large, with tens of billions of parameters, while bidirectional pretrained language models (BiPLMs) such as BERT stayed few hundred millions at most. However, a previous research showed that BiPLMs with a relatively small number of parameters are more effective for classical non-generative tasks. The purpose of this study is to determine whether the model architecture or the pretraining objective brings about the difference. We trained BiPLMs and UniPLMs under a controlled condition and confirmed that the differences in GLUE scores increase before and after pretraining. Since the only difference between the two models before pretraining is model architectures, the results imply that the influence of pretraining objectives is more dominant than that of model architectures.