抄録
In fast-paced development scenarios—such as Game Jams or prototype builds for exhibitions—designers often lack
the time to systematically check whether players fully understand the in-game tutorials, whether the level pacing
feels balanced, or whether the overall play experience is satisfying. This paper presents an evaluation and
optimization approach powered by large language models (using ChatGPT as an example) to help designers during
the prototype phase. Our method identifies missing tutorial elements, predicts where players are likely to get stuck,
estimates completion times, and generates targeted improvement suggestions. We validate its feasibility and
effectiveness by applying it to a nonlinear sandbox level in a Souls-like game. Finally, we compare AI-driven
evaluation with traditional manual assessment in terms of accuracy, efficiency, and objectivity.