2024 Volume 31 Issue 4 Pages 1563-1597
Auditing time-series performance degradation has become a challenge as researchers and practitioners commonly use pre-trained models. Pre-trained language models typically incur huge costs in training and inference; therefore, considering efficient auditing and retraining schemes becomes important. This study proposes a framework for auditing the time-series performance degradation of pre-trained language models and word embeddings by calculating the semantic shift of words in the training corpus and supporting decision-making regarding re-training. First, we constructed RoBERTa and word2vec models with different training corpus periods using Japanese and English news articles from 2011 to 2021 and observed the time-series performance degradation. Semantic Shift Stability, a metric that can be calculated from the diachronic word semantic shift in the training corpus, was smaller when the performance of the pre-trained models degraded significantly over time. This confirmed that the metric is useful in monitoring applications. The proposed framework has advantages of inferring the cause by using words with significant changes in meaning. The experiments conducted implied the effects of the 2016 U.S. presidential election and the 2020 COVID-19 pandemic. The source code is available at the URL https://github.com/Nikkei/semantic-shift-stability.