2025 Volume 32 Issue 1 Pages 91-113
The recent progress in the performance of large-scale language models has necessitated the detection of errors in generated content. One approach for detecting errors in language model generation is to estimate the confidence of the generated content based on the information available at the time of generation. Existing methods mainly use model outputs and internal states; however, the setting in which the training data of language models can be accessed has not been fully explored. This study examines the usefulness of training data in estimating the confidence of the output of learned language models. We trained a medium-scale language model, built a data store consisting of the full text of training data, as well as examined and evaluated several confidence estimation methods based on the training data. Experimental results using a language model knowledge evaluation task confirmed that combining predictive likelihood and information on relevant cases in the training data improves the accuracy of confidence estimation compared to when the training data is not used.