2024 Volume 31 Issue 2 Pages 707-732
In this study, we collected minutes from national and local assemblies published on the web, constructing a large corpus. In addition, we developed pre-trained language models adapted to the Japanese political domain using the constructed corpus of meeting records, incorporating several derivatives. Our models demonstrated superior and comparable performances to conventional models for tasks within the political and nonpolitical domains, respectively. In addition, we showed that increasing the number of training steps during domain adaptation with additional pre-training improves performance significantly. Furthermore, leveraging the corpus from the initial pre-training enhances performance in the adapted domain while maintaining performance in the non-adapted domains.