Large Language Models における意図的な性能制限

岩井 皓暉; 熊谷 雄介; 馬場 雪乃

doi:10.11517/pjsai.JSAI2025.0_1M3OS47a05

Abstract

Large Language Models (LLMs) possess the ability to perform well on unknown tasks and flexibly alter their behavior according to prompts. Leveraging this characteristic, there are attempts to assign virtual personas or personalities to LLMs and make them behave accordingly. If we could intentionally limit LLM performance, the constructed virtual personas would likely become more realistic (e.g., making a kindergartener unable to solve integral calculus). This paper addresses such intentional performance degradation of LLMs. Using multiple Japanese benchmark tasks, we report that it is difficult to degrade LLM performance in downstream tasks through prompts alone. We also examine the benchmarks necessary for measuring performance degradation.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!