Proceedings of Annual Conference, Digital Game Research Association JAPAN
Online ISSN : 2758-6480
16th Annual Conference
Conference information

A Misinterpretation of Statistical Hypothesis Testing by Large Language Models Encountered in the Design of a Statistical Learning Game
A Case Study of Normality Testing
*Mitsuhiro OGAWA
Author information
CONFERENCE PROCEEDINGS OPEN ACCESS

Pages 155-158

Details
Abstract
In recent years, large language models (LLMs) have been increasingly adopted in practical applications and interactive systems. Contemporary LLMs rely heavily on inductive learning and are therefore strongly influenced by the characteristics and quality of their training data. While the importance of high-quality training data is widely acknowledged, it is also reasonable to assume that erroneous interpretations may be inherited from existing sources. In this study, we report a concrete example of such an error encountered during the design of a statistical learning game utilizing an LLM. During discussions with the model regarding statistical hypothesis testing, a critical issue arose concerning tests of normality. In standard statistical practice, normality tests are conducted under the null hypothesis that the sample data follow a normal distribution. If this null hypothesis is not rejected, the correct conclusion is that the data cannot be determined to either follow or deviate from a normal distribution. However, the LLM consistently interpreted a non-rejection of the null hypothesis as evidence that the data do follow a normal distribution. Through extended interaction, the model eventually acknowledged this reasoning as incorrect. Notably, similar misinterpretations were also observed in other artificial intelligence models. Given that analogous misunderstandings of normality testing can be found in academic papers and conference presentations, it is plausible that these errors were acquired through the training data. These observations suggest that LLMs cannot be assumed to reason correctly about statistical concepts without careful scrutiny. Consequently, particular caution is required when incorporating LLM-based reasoning into educational tools such as statistical learning games.
Content from these authors
© 2026 Digital Research Association JAPAN
Previous article Next article
feedback
Top