統計学習ゲームの企画において直面した大規模言語モデルにおける統計的検定解釈の誤謬 ―正規性検定を事例として―

小川 充洋

doi:10.57518/digrajproc.16.0_155

Abstract

In recent years, large language models (LLMs) have been increasingly adopted in practical applications and interactive systems. Contemporary LLMs rely heavily on inductive learning and are therefore strongly influenced by the characteristics and quality of their training data. While the importance of high-quality training data is widely acknowledged, it is also reasonable to assume that erroneous interpretations may be inherited from existing sources. In this study, we report a concrete example of such an error encountered during the design of a statistical learning game utilizing an LLM. During discussions with the model regarding statistical hypothesis testing, a critical issue arose concerning tests of normality. In standard statistical practice, normality tests are conducted under the null hypothesis that the sample data follow a normal distribution. If this null hypothesis is not rejected, the correct conclusion is that the data cannot be determined to either follow or deviate from a normal distribution. However, the LLM consistently interpreted a non-rejection of the null hypothesis as evidence that the data do follow a normal distribution. Through extended interaction, the model eventually acknowledged this reasoning as incorrect. Notably, similar misinterpretations were also observed in other artificial intelligence models. Given that analogous misunderstandings of normality testing can be found in academic papers and conference presentations, it is plausible that these errors were acquired through the training data. These observations suggest that LLMs cannot be assumed to reason correctly about statistical concepts without careful scrutiny. Consequently, particular caution is required when incorporating LLM-based reasoning into educational tools such as statistical learning games.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!