Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 4Xin2-37
Conference information

Analyzing Effects of Architectures and Pretraining Objective on Unidirectional and Bidirectional Pretrained Language Models
*Hiroyoshi NAGAOTakumi GOTOYuta KOREEDA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Recently, language models have grown in scale, allowing a single model to address broad tasks that previously required individual development. Unidirectional pretrained language models (UniPLMs) such as GPT have become extremely large, with tens of billions of parameters, while bidirectional pretrained language models (BiPLMs) such as BERT stayed few hundred millions at most. However, a previous research showed that BiPLMs with a relatively small number of parameters are more effective for classical non-generative tasks. The purpose of this study is to determine whether the model architecture or the pretraining objective brings about the difference. We trained BiPLMs and UniPLMs under a controlled condition and confirmed that the differences in GLUE scores increase before and after pretraining. Since the only difference between the two models before pretraining is model architectures, the results imply that the influence of pretraining objectives is more dominant than that of model architectures.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top