Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
In the midst of rapid development of Large Language Models (LLMs), there is an ongoing trend towards enlarging training corpora to train high-performance models. However, not all texts included in such large-scale training corpora are of high quality, and the presence of low-quality texts in these extensively collected corpora could potentially hinder the improvement of model performance. This study proposes a robust learning method, with the objective of mitigating the impact of noise in pre-training of language models with corpora containing low-quality texts found within real-world data sources. Specifically, we focus on a broad class known as Bregman-Divergence, employing β-Divergence and γ-Divergence, which are included in this class and effective in robust statistics. In our experiments, we conducted fine-tuning and additional pre-training of BERT, demonstrating that our proposed method functions robustly in training with noisy training texts and labels, in comparison to the conventional training approach using KL-Divergence.