2023 Volume 2023 Issue AGI-024 Pages 02-
In this study, I evaluate the proficiency of GPT-4, by OpenAI, particularly focusing on its handling of simple high-digit addition tasks. While GPT-4 exhibits impressive capabilities in various tasks, it showed inconsistencies when dealing with ten-digit addition problems. My examination showed that while GPT-4 correctly solved all three-digit additions, it was only 60% accurate for ten-digit additions. Adding prompts to encourage a step-by-step addition process did not improve this accuracy. I suggest that this limitation may be due to the inability of large language models (LLMs) to extract commonalities from different concepts, as seen in the process of addition. This difference between human cognition and LLMs may be crucial for the further development of these models.