GPT-4による足し算実験から示唆されるLarge Language Modelsの課題

岡谷 基弘

doi:10.11517/jsaisigtwo.2023.AGI-024_02

Abstract

In this study, I evaluate the proficiency of GPT-4, by OpenAI, particularly focusing on its handling of simple high-digit addition tasks. While GPT-4 exhibits impressive capabilities in various tasks, it showed inconsistencies when dealing with ten-digit addition problems. My examination showed that while GPT-4 correctly solved all three-digit additions, it was only 60% accurate for ten-digit additions. Adding prompts to encourage a step-by-step addition process did not improve this accuracy. I suggest that this limitation may be due to the inability of large language models (LLMs) to extract commonalities from different concepts, as seen in the process of addition. This difference between human cognition and LLMs may be crucial for the further development of these models.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!