Host: The Japanese Society for Artificial Intelligence
Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 37
Location : [in Japanese]
Date : June 06, 2023 - June 09, 2023
Recently, pre-trained model based on large corpus have been developed and released, and opportunities are expanding to use them to analyze linguistic data such as product review for business purposes by fine-tuning with training data specific to the problem to be solved. However, in business situations, available datasets are not always plentiful due to various constraints, and it's not easy to determine how much data is enough to achieve the target performance. This paper proposes a method to estimate the amount of data required to achieve target performance by predicting the growth of classification performance when the amount of data for additional training increases, based on the results of classification performance of fine-tuned model from hundred to one thousand data initially obtained. Specifically, we show that when a pre-trained model is fine-tuned, the classification performance increases with a similar trend regardless of the original dataset size as the number of epochs is increased. We then verify that approximate formula based on that tendency can be used to estimate the classification performance obtained when the model is trained with 10 times or more training data, even when the initial additional training data is limited.